SELF-TUNING FIXED-POINT LEAST-SQUARES SOLVER
20230412428 ยท 2023-12-21
Inventors
- Junmo Sung (Allen, TX, US)
- Tiexing Wang (Plano, TX, US)
- Yeqing Hu (Allen, TX, US)
- Yang Li (Plano, TX)
- Jianzhong Zhang (Dallas, TX, US)
Cpc classification
International classification
Abstract
A method and device for self-tuning scales of variables for processing in fixed-point hardware. The device includes a sequence of fixed-point arithmetic circuits configured to receive at least one input signal and output at least one output signal. The circuits are preconfigured with control scales associated with each of the input and output signals. A first circuit in the sequence is configured to receive a first input signal having a dynamic true scale that is different from the control scale associated with the first input signal. Each of the circuits is further configured to determine, for each of the output signals, an adaptive scale from the control scale associated with the output signal based on the true scale of the first input signal and the control scale associated with the first input signal, and generate, from the input signal, the output signal having the associated adaptive scale.
Claims
1. An electronic device, comprising: a sequence of fixed-point arithmetic circuits, each of the circuits configured to: receive at least one input signal, and output at least one output signal, wherein the circuits are preconfigured with control scales associated with each of the at least one input and output signals, wherein a first fixed-point arithmetic circuit in the sequence is further configured to receive a first input signal having a dynamic true scale that is different from the control scale associated with the first input signal, and wherein each of the fixed-point arithmetic circuits is further configured to: determine, for each of the at least one output signals, an adaptive scale from the control scale associated with the output signal based on the true scale of the first input signal and the control scale associated with the first input signal, and generate, from the at least one input signal, the at least one output signal having the adaptive scale of the at least one output signal.
2. The electronic device of claim 1, wherein each of the fixed-point arithmetic circuits is further configured to: determine the adaptive scales such that likelihoods of bit underflow and bit overflow are reduced in the generation of the at least one output signal having the adaptive scale of the at least one output signal as compared to a generation of the at least one output signal having the control scale associated with the at least one output signal.
3. The electronic device of claim 1, wherein each of the fixed-point arithmetic circuits is further configured to: determine, for each of the at least one output signals, the adaptive scale from the control scale associated with the output signal by addition or subtraction of a scale tuning factor.
4. The electronic device of claim 3, further comprising: a processor operatively coupled to the fixed-point arithmetic circuits, the processor configured to generate the scale tuning factor using the true scale of the first input signal and the control scale associated with the first input signal.
5. The electronic device of claim 4, wherein the processor is further configured to: generate the scale tuning factor to be one half of the difference between the true scale of the first input signal and the control scale associated with the first input signal.
6. The electronic device of claim 3, wherein each of the fixed-point arithmetic circuits is further configured to: for each of the at least one output signals that represents a result of an operation that includes matrix inversion, subtract the scale tuning factor from the control scale associated with the output signal to determine the adaptive scale, and for each of the at least one output signals that represents a result of an operation that does not include matrix inversion, add the scale tuning factor to the control scale associated with the output signal to determine the adaptive scale.
7. The electronic device of claim 1, wherein: the first fixed-point arithmetic circuit in the sequence is further configured to perform matrix decomposition on the first input signal to generate at least two decomposition matrices as the output signals, and the other fixed-point arithmetic circuits in the sequence are configured to determine a solution to a system of linear equations using the at least two decomposition matrices and the adaptive scales of the at least two decomposition matrices.
8. The electronic device of claim 1, wherein: a system of linear equations is defined by the first input signal and a second input signal that is received by one of the fixed-point arithmetic circuits, the second input signal has a dynamic true scale, and a final fixed-point arithmetic circuit in the sequence is further configured to: generate, as the at least one output signal, a solution to the system of linear equations; and determine the adaptive scale of the solution such that it is different from the true scale of the second input signal.
9. The electronic device of claim 1, wherein: the first input signal is a Hermitian matrix C having the dynamic true scale S.sub.C and the associated control scale N.sub.C, the first fixed-point arithmetic circuit in the sequence is further configured to: perform Cholesky matrix decomposition on C to generate, as the at least one output signal: a lower triangular matrix L having the associated control scale N.sub.L and the adaptive scale S.sub.L, and a vector I.sub.L having the associated control scale N.sub.IL and the adaptive scale S.sub.IL, wherein I.sub.L is a reciprocal of the diagonal elements of L; determine S.sub.L from N.sub.L based on S.sub.C and N.sub.C, and determine S.sub.IL from N.sub.IL based on S.sub.C and N.sub.C, a second fixed-point arithmetic circuit in the sequence is further configured to: receive a second input signal that is a matrix p having a dynamic true scale S.sub.p and the associated control scale N.sub.p, wherein S.sub.p=N.sub.p; perform forward substitution based on p, L, and I.sub.L, to generate, as the at least one output signal, a matrix z that is the solution of p=Lz for z, where z=L.sup.Hx, z having the adaptive scale S.sub.z and the associated control scale N.sub.z such that N.sup.z=N.sub.p; and determine S.sub.z from N.sub.p based on S.sub.C and N.sub.C, and a third fixed-point arithmetic circuit in the sequence is further configured to: perform backward substitution based on z, L, and I.sub.L, to generate, as the at least one output signal, a matrix x that is a solution of z=L.sup.Hx for x, x having the adaptive scale S.sub.x and the associated control scale N.sub.x such that N.sub.x=N.sub.p; and determine S.sub.x from N.sub.p based on S.sub.C and N.sub.C.
10. The electronic device of claim 9, further comprising: a preprocessing circuit configured to: receive a matrix y and a matrix A as inputs, where y and A define a system of linear equations y=Ax, generate the first input signal C such that C=A.sup.HA, and generate the second input signal p such that p=A.sup.Hy.
11. A method of operation of an electronic device comprising a sequence of fixed-point arithmetic circuits configured to receive at least one input signal and output at least one output signal, the method comprising: receiving, at a first fixed-point arithmetic circuit in the sequence, a first input signal having a dynamic true scale that is different from a control scale associated with the first input signal, wherein the fixed-point arithmetic circuits are preconfigured with control scales associated with each of the at least one input and output signals; determining, by each of the fixed-point arithmetic circuits for each of the at least one output signals, an adaptive scale from the control scale associated with the output signal based on the true scale of the first input signal and the control scale associated with the first input signal; and generating, by each of the fixed-point arithmetic circuits from the at least one input signal, the at least one output signal having the adaptive scale of the at least one output signal.
12. The method of claim 11, further comprising: determining, by each of the fixed-point arithmetic circuits, the adaptive scales such that likelihoods of bit underflow and bit overflow are reduced in the generation of the at least one output signal having the adaptive scale of the at least one output signal as compared to a generation of the at least one output signal having the control scale associated with the at least one output signal.
13. The method of claim 11, further comprising: determining, by each of the fixed-point arithmetic circuits for each of the at least one output signals, the adaptive scale from the control scale associated with the output signal by addition or subtraction of a scale tuning factor.
14. The method of claim 13, further comprising: generating, by a processor operatively coupled to the fixed-point arithmetic circuits, the scale tuning factor using the true scale of the first input signal and the control scale associated with the first input signal.
15. The method of claim 14, further comprising: generating, by the processor, the scale tuning factor to be one half of the difference between the true scale of the first input signal and the control scale associated with the first input signal.
16. The method of claim 13, further comprising: subtracting, by each of the fixed-point arithmetic circuits, for each of the at least one output signals that represents a result of an operation that includes matrix inversion, the scale tuning factor from the control scale associated with the output signal to determine the adaptive scale; and adding, by each of the fixed-point arithmetic circuits, for each of the at least one output signals that represents a result of an operation that does not include matrix inversion, the scale tuning factor to the control scale associated with the output signal to determine the adaptive scale.
17. The method of claim 11, further comprising: performing, by the first fixed-point arithmetic circuit in the sequence, matrix decomposition on the first input signal to generate at least two decomposition matrices as the output signals; and determining, by the other fixed-point arithmetic circuits in the sequence, a solution to a system of linear equations using the at least two decomposition matrices and the adaptive scales of the at least two decomposition matrices.
18. The method of claim 11, wherein: a system of linear equations is defined by the first input signal and a second input signal that is received by one of the fixed-point arithmetic circuits, the second input signal has a dynamic true scale, and the method further comprises: generating, by a final fixed-point arithmetic circuit in the sequence as the at least one output signal, a solution to the system of linear equations; and determining, by the final fixed-point arithmetic circuit in the sequence, the adaptive scale of the solution such that it is different from the true scale of the second input signal.
19. The method of claim 11, wherein: the first input signal is a Hermitian matrix C having the dynamic true scale S.sub.C and the associated control scale N.sub.C, and the method further comprises: performing, by the first fixed-point arithmetic circuit in the sequence, Cholesky matrix decomposition on C to generate, as the at least one output signal: a lower triangular matrix L having the associated control scale N.sub.L and the adaptive scale S.sub.L, and a vector I.sub.L having the associated control scale N.sub.IL and the adaptive scale S.sub.IL, wherein I.sub.L is a reciprocal of the diagonal elements of L; determining, by the first fixed-point arithmetic circuit in the sequence, S.sub.L from N.sub.L based on S.sub.C and N.sub.C; determining, by the first fixed-point arithmetic circuit in the sequence, S.sub.IL from N.sub.IL based on S.sub.C and N.sub.C; receiving, at a second fixed-point arithmetic circuit in the sequence, a second input signal that is a matrix p having a dynamic true scale S.sub.p and the associated control scale N.sub.p, wherein S.sub.p=N.sub.p; performing, by the second fixed-point arithmetic circuit in the sequence, forward substitution based on p, L, and I.sub.L, to generate, as the at least one output signal, a matrix z that is the solution of p=Lz for z, where z=L.sup.Hx, z having the adaptive scale S.sub.z and the associated control scale N.sub.z such that N.sub.z=N.sub.p; determining, by the second fixed-point arithmetic circuit in the sequence, S.sub.z from N.sub.p based on S.sub.C and N.sub.C; performing, by a third fixed-point arithmetic circuit in the sequence, backward substitution based on z, L, and I.sub.L, to generate, as the at least one output signal, a matrix x that is a solution of z=L.sup.Hx for x, x having the adaptive scale S.sub.x and the associated control scale N.sub.x such that N.sub.x=N.sub.p; and determining, by the third fixed-point arithmetic circuit in the sequence, S.sub.x from N.sub.p based on S.sub.C and N.sub.C.
20. The method of claim 19, further comprising: receiving, at a preprocessing circuit, a matrix y and a matrix A as inputs, where y and A define a system of linear equations y=Ax; generating, by the preprocessing circuit, the first input signal C such that C=A.sup.HA; and generating, by the preprocessing circuit, the second input signal p such that p=A.sup.Hy.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] For a more complete understanding of the present disclosure and its advantages, reference is now made to the following description taken in conjunction with the accompanying drawings, in which like reference numerals represent like parts:
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
DETAILED DESCRIPTION
[0024]
[0025] Embodiments of the present disclosure recognize that digital signal processing algorithms are typically designed using high precision floating-point operations and then implemented in fixed-point (or FxP) hardware, which is often less precise due to design constraints. SQNR is a measurement of the difference in precision between the fixed-point signal processing operation and its floating-point counterpart. One source of lowered SQNR in binary fixed-point implementation is improperly managed bit width and scale of processed data, where bit width refers to the number of bits in a binary number (e.g., the number of bits necessary to represent a decimal value in binary) and scale refers to the number of bits in a binary number that represent the fractional portion of the number. That is, the scale value determines the binary point (or radix point) of a binary fixed-point number, which defines which bits represent an integer portion of the number (integer bits) and which bits represent the fractional portion of the number (fractional bits).
[0026] Embodiments of the present disclosure further recognize that in fixed-point signal processing the least-squares (LS) solver is one of the most difficult and complex processing operations, as it involves matrix inversion which needs fine-tuning depending on the input and output bit widths and scales to avoid bit underflow or overflow. Input and output scales and bit widths refer to the scales and bit widths of the binary input and output, respectively. When the input has a large range of possible bit widths, the output scale needs to vary dynamically due to the nature of matrix inversion to avoid underflow or overflow. In traditional matrix inversion processing implementations, the output scale is tied to the input scale, and underflow or overflow can easily occur at the extremes of a large range of bit widths.
[0027] Accordingly, embodiments of the present disclosure provide methods and apparatuses for implementing binary LS solver operations in fixed-point hardware that accommodates variable bit width inputs and has the self-tuning property. The self-tuning property refers to the capability to adjust the input and output scales of processed data at various arithmetic circuits in the hardware as needed to reduce bit overflow and underflow, thereby improving SQNR.
[0028]
[0029]
[0030] As shown in
[0031] The gNB 102 provides wireless broadband access to the network 130 for a first plurality of user equipments (UEs) within a coverage area 120 of the gNB 102. The first plurality of UEs includes a UE 111, which may be located in a small business; a UE 112, which may be located in an enterprise; a UE 113, which may be a WiFi hotspot; a UE 114, which may be located in a first residence; a UE 115, which may be located in a second residence; and a UE 116, which may be a mobile device, such as a cell phone, a wireless laptop, a wireless PDA, or the like. The gNB 103 provides wireless broadband access to the network 130 for a second plurality of UEs within a coverage area 125 of the gNB 103. The second plurality of UEs includes the UE 115 and the UE 116. In some embodiments, one or more of the gNBs 101-103 may communicate with each other and with the UEs 111-116 using 5G/NR, long term evolution (LTE), long term evolution-advanced (LTE-A), WiMAX, WiFi, or other wireless communication techniques.
[0032] Depending on the network type, the term base station or BS can refer to any component (or collection of components) configured to provide wireless access to a network, such as transmit point (TP), transmit-receive point (TRP), an enhanced base station (eNodeB or eNB), a 5G/NR base station (gNB), a macrocell, a femtocell, a WiFi access point (AP), or other wirelessly enabled devices. Base stations may provide wireless access in accordance with one or more wireless communication protocols, e.g., 5G/NR 3rd generation partnership project (3GPP) NR, long term evolution (LTE), LTE advanced (LTE-A), high speed packet access (HSPA), Wi-Fi 802.11a/b/g/n/ac, etc. For the sake of convenience, the terms BS and TRP are used interchangeably in this patent document to refer to network infrastructure components that provide wireless access to remote terminals. Also, depending on the network type, the term user equipment or UE can refer to any component such as mobile station, subscriber station, remote terminal, wireless terminal, receive point, or user device. For the sake of convenience, the terms user equipment and UE are used in this patent document to refer to remote wireless equipment that wirelessly accesses a BS, whether the UE is a mobile device (such as a mobile telephone or smartphone) or is normally considered a stationary device (such as a desktop computer or vending machine).
[0033] Dotted lines show the approximate extents of the coverage areas 120 and 125, which are shown as approximately circular for the purposes of illustration and explanation only. It should be clearly understood that the coverage areas associated with gNBs, such as the coverage areas 120 and 125, may have other shapes, including irregular shapes, depending upon the configuration of the gNBs and variations in the radio environment associated with natural and man-made obstructions.
[0034] Although
[0035]
[0036] As shown in
[0037] The transceivers 210a-210n receive, from the antennas 205a-205n, incoming RF signals, such as signals transmitted by UEs in the network 100. The transceivers 210a-210n down-convert the incoming RF signals to generate IF or baseband signals. The IF or baseband signals are processed by receive (RX) processing circuitry in the transceivers 210a-210n and/or controller/processor 225, which generates processed baseband signals by filtering, decoding, and/or digitizing the baseband or IF signals. The controller/processor 225 may further process the baseband signals.
[0038] Transmit (TX) processing circuitry in the transceivers 210a-210n and/or controller/processor 225 receives analog or digital data (such as voice data, web data, e-mail, or interactive video game data) from the controller/processor 225. The TX processing circuitry encodes, multiplexes, and/or digitizes the outgoing baseband data to generate processed baseband or IF signals. The transceivers 210a-210n up-convert the baseband or IF signals to RF signals that are transmitted via the antennas 205a-205n.
[0039] The controller/processor 225 can include one or more processors or other processing devices that control the overall operation of the gNB 102. For example, the controller/processor 225 could control the reception of UL channel signals and the transmission of DL channel signals by the transceivers 210a-210n in accordance with well-known principles. The controller/processor 225 could support additional functions as well, such as more advanced wireless communication functions. For instance, the controller/processor 225 could support beam forming or directional routing operations in which outgoing/incoming signals from/to multiple antennas 205a-205n are weighted differently to effectively steer the outgoing signals in a desired direction. Any of a wide variety of other functions could be supported in the gNB 102 by the controller/processor 225.
[0040] The controller/processor 225 or the transceivers 210a-210n may include fixed-point arithmetic circuitry that may perform digital signal processing on digital UL or DL channel signals provided to the fixed-point arithmetic circuitry. For example, the fixed-point arithmetic circuitry may perform a least-squares estimate (using, e.g., a Cholesky decomposition and forward-backward substitution approach, as described below) as part of MIMO zero-forcing (ZF), minimum mean squared error (MMSE) precoding, equalization, channel prediction, or other such digital signal processing algorithms. The fixed-point arithmetic circuitry may include application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or similar hardware implementations of one or more digital signal processing algorithms.
[0041] The controller/processor 225 is also capable of executing programs and other processes resident in the memory 230, such as an OS. The controller/processor 225 can move data into or out of the memory 230 as required by an executing process.
[0042] The controller/processor 225 is also coupled to the backhaul or network interface 235. The backhaul or network interface 235 allows the gNB 102 to communicate with other devices or systems over a backhaul connection or over a network. The interface 235 could support communications over any suitable wired or wireless connection(s). For example, when the gNB 102 is implemented as part of a cellular communication system (such as one supporting 5G/NR, LTE, or LTE-A), the interface 235 could allow the gNB 102 to communicate with other gNBs over a wired or wireless backhaul connection. When the gNB 102 is implemented as an access point, the interface 235 could allow the gNB 102 to communicate over a wired or wireless local area network or over a wired or wireless connection to a larger network (such as the Internet). The interface 235 includes any suitable structure supporting communications over a wired or wireless connection, such as an Ethernet or transceiver.
[0043] The memory 230 is coupled to the controller/processor 225. Part of the memory 230 could include a RAM, and another part of the memory 230 could include a Flash memory or other ROM.
[0044] Although
[0045]
[0046] As shown in
[0047] The transceiver(s) 310 receives, from the antenna 305, an incoming RF signal transmitted by a gNB of the network 100. The transceiver(s) 310 down-converts the incoming RF signal to generate an intermediate frequency (IF) or baseband signal. The IF or baseband signal is processed by RX processing circuitry in the transceiver(s) 310 and/or processor 340, which generates a processed baseband signal by filtering, decoding, and/or digitizing the baseband or IF signal. The RX processing circuitry sends the processed baseband signal to the speaker 330 (such as for voice data) or is processed by the processor 340 (such as for web browsing data).
[0048] TX processing circuitry in the transceiver(s) 310 and/or processor 340 receives analog or digital voice data from the microphone 320 or other outgoing baseband data (such as web data, e-mail, or interactive video game data) from the processor 340. The TX processing circuitry encodes, multiplexes, and/or digitizes the outgoing baseband data to generate a processed baseband or IF signal. The transceiver(s) 310 up-converts the baseband or IF signal to an RF signal that is transmitted via the antenna(s) 305.
[0049] The processor 340 can include one or more processors or other processing devices and execute the OS 361 stored in the memory 360 in order to control the overall operation of the UE 116. For example, the processor 340 could control the reception of DL channel signals and the transmission of UL channel signals by the transceiver(s) 310 in accordance with well-known principles. In some embodiments, the processor 340 includes at least one microprocessor or microcontroller.
[0050] The processor 340 or the transceivers 310 may include fixed-point arithmetic circuitry that may perform digital signal processing on digital UL or DL channel signals provided to the fixed-point arithmetic circuitry. For example, the fixed-point arithmetic circuitry may perform a least-squares estimate (using, e.g., a Cholesky decomposition and forward-backward substitution approach, as described below) as part of MIMO zero-forcing (ZF), minimum mean squared error (MMSE) precoding, equalization, channel prediction, or other such digital signal processing algorithms. The fixed-point arithmetic circuitry may include application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or similar hardware implementations of one or more digital signal processing algorithms.
[0051] The processor 340 is also capable of executing other processes and programs resident in the memory 360. The processor 340 can move data into or out of the memory 360 as required by an executing process. In some embodiments, the processor 340 is configured to execute the applications 362 based on the OS 361 or in response to signals received from gNBs or an operator. The processor 340 is also coupled to the I/O interface 345, which provides the UE 116 with the ability to connect to other devices, such as laptop computers and handheld computers. The I/O interface 345 is the communication path between these accessories and the processor 340.
[0052] The processor 340 is also coupled to the input 350, which includes for example, a touchscreen, keypad, etc., and the display 355. The operator of the UE 116 can use the input 350 to enter data into the UE 116. The display 355 may be a liquid crystal display, light emitting diode display, or other display capable of rendering text and/or at least limited graphics, such as from web sites.
[0053] The memory 360 is coupled to the processor 340. Part of the memory 360 could include a random-access memory (RAM), and another part of the memory 360 could include a Flash memory or other read-only memory (ROM).
[0054] Although
[0055]
[0056]
[0057] In this example, the LS solver 506 is implemented using a Cholesky decomposition and forward-backward (FW-BW) substitution approach (as shown in blocks 5061, 5062, and 5063, which may represent separate fixed-point arithmetic circuits, or portions of an integrated fixed-point arithmetic circuit). However, it is understood that the disclosure is not limited to this approach, and any other LS solver approach could be implemented using the embodiments of the disclosure disclosed below.
[0058] The LS solver 506 solves the following equation for x:
y=Ax(1)
[0059] where A is an MN complex matrix, y is an M1 complex vector, and x is an N1 complex vector. The inputs at 502 are y and A.
[0060] For preprocessing operations at 504, A.sup.H is multiplied to both sides of equation (1) to obtain:
p=Cx(2)
[0061] where C=A.sup.HA is an NN complex Hermitian matrix and p=A.sup.Hy is an N1 complex vector.
[0062] The Cholesky-based LS solver 506 first performs Cholesky decomposition at block 5061 to decompose C in the form of LL.sup.H and find L, where L is an NN complex lower triangular matrix. The Cholesky decomposition block 5061 also generates I.sub.L (an N1 real vector) as a side product that can reduce the number of operations needed to perform the FW-BW substitution of blocks 5062 and 5063. I.sub.L is a vector with elements that are a reciprocal of the diagonal elements of L.
[0063] Once L and I.sub.L are obtained, the FW-BW substitution can be applied to p=LL.sup.Hx (at blocks 5062 and 5063) to determine x. More specifically, forward substitution block 5062 performs forward substitution on p=Lz to find z, where z=L.sup.Hx, and backward substitution block 5063 performs backward substitution on z=L.sup.Hx to find x. It is understood that y and x, thus p and x as well, can readily be expanded from vectors to matrices.
[0064] The scales of variables in a fixed-point implementation are typically determined during the fixed-point hardware design stage and are provided to each fixed-point module (or arithmetic circuit, e.g., blocks 5061, 5062, and 5063). These pre-determined scales are referred to herein as control scales, denoted as N with a subscript that indicates the variable associated with the scale. The provided control scales are used to track and match the scales in internal operations and output generation. That is, each fixed-point module performs its operations assuming that the variables have scale values that correspond to their provided control scale. In the fixed-point Cholesky-based LS solver 506, the following control scales are provided to the Cholesky decomposition and FW-BW substitution blocks: N.sub.C, N.sub.L, N.sub.IL, and N.sub.p.
[0065] The true scale of a variable, denoted herein as S with a subscript that indicates the variable associated with the scale, refers to the actual scale value of the variablethat is, the scale of the variable as used in previous operations performed on that variable. In conventional designs, the control scale values are assumed to correspond to the true scales of the variables (i.e., the control scale is set equivalent to the true scale by design). In the conventional design of fixed-point Cholesky-based LS solver 506, then, S.sub.C=N.sub.C, S.sub.L=N.sub.L, S.sub.IL=N.sub.IL, and S.sub.p=N.sub.p.
[0066]
[0067] The scales associated with the input variables C and p are dynamic in the sense that the true scales of the inputs they are dependent on the source of the inputs. However, because the system is designed to operate under the assumption that S.sub.C=N.sub.C and S.sub.p=N.sub.p, the design of the values of the control scales N.sub.C and N.sub.p is constrained based on the expected true scales of the inputs. The values of N.sub.L and N.sub.IL are freely tunable during the design phase, however, and therefore may be tuned to optimize operations performed by the blocks of the LS solver 600.
[0068] In embodiments of the present disclosure, the input control scale N.sub.C is not constrained to be equivalent to the true scale S.sub.C of its associated input variable C. Accordingly, the values of N.sub.C, N.sub.L, and N.sub.IL are all freely tunable during the design phase and can be arbitrary values of choice. The value of the input control scale N.sub.C being different from the input true scale S.sub.C may have a cascading impact on the scale of all variables in the following operations that is needed to avoid underflow or overflow. This impact will therefore need to be analytically tracked and controlled to avoid underflow or overflow.
[0069]
[0070] In various embodiments, the true scales of the variables in the Cholesky-based LS solver 700 are tracked by determining a dynamic scale difference based on N.sub.C and S.sub.C, and applying the dynamic scale difference to determine adaptive scale values for S.sub.L, S.sub.IL, S.sub.z, and S.sub.x. The dynamic scale difference is denoted herein as .sub.C. An adaptive scale value herein refers to a dynamic true scale value that is determined by adjusting a provided static control scale value using, e.g., the dynamic scale difference value. It is understood that other terminology could be used to refer to the adaptive scale without affecting this disclosure.
[0071] In the embodiment of the example of
and where the subscript i, j denotes the element of the matrix at the row i and column j. In other embodiments, different formulas may be used for similar purposes.
[0072] In computation of the equations (3) and (4) for the diagonal elements L.sub.j,j and the off-diagonal elements L.sub.i,j of L, if N.sub.CS.sub.C, N.sub.LS.sub.L, and N.sub.ILS.sub.IL, then there will need to be two scale changes in order to satisfy conditions requiring matching scales of variables for performing operations or matching the specified output scale. As a result, the true scale of L.sub.j,j and L.sub.i,j will be the adaptive scale S.sub.L=N.sub.L+.sub.C, where
and the true scale of I.sub.L,j will be the adaptive scale S.sub.IL=N.sub.IL.sub.C. These results are derived below.
[0073] In deriving the adaptive scale S.sub.L of L, although the diagonal elements L.sub.j,j and the off-diagonal elements L.sub.i,j are computed using different equations, they need to have matching scales, as all values of L need to have the same scale. For computation of the diagonal element L.sub.j,j for j=1 using equation (3), {square root over (C.sub.1,1)} has the scale of
The adaptive output true scale S.sub.L can then be obtained from the following equation for output scale matching using the output control scale N.sub.L:
[0074] For computation of the diagonal elements L.sub.j,j for j1 using equation (3), first C.sub.j,j and L.sub.j,kL.sub.j,k* must have matching scales to perform the operation C.sub.j,j.sub.k=1.sup.j-1L.sub.j,kL.sub.j,k*. Accordingly, the true scale of L.sub.j,kL.sub.j,k*, which is 2S.sub.L, is changed to 2S.sub.L(2N.sub.LN.sub.C) after scale matching to the true scale of C.sub.j,j, which is S.sub.C, based on the output control scale N.sub.L and the input control scale N.sub.C. Using the previously obtained value of S.sub.L=N.sub.L+.sub.C, it can be confirmed that:
[0075] Next, the scale of {square root over (C.sub.j,j.sub.k=1.sup.j-1L.sub.j,kL.sub.j,k*)}, which is
is scale matched to the specified output scale based on the control scales to become:
[0076] For computation of the off-diagonal elements L.sub.i,j using equation (4), the scale of C.sub.i,j.sub.k=1.sup.j-1L.sub.i,kL.sub.j,k* is S.sub.C=2S.sub.L(2N.sub.LN.sub.C) similarly to the diagonal elements. The result of multiplying C.sub.i,j.sub.k=1.sup.j-1L.sub.i,kL.sub.j,k* by I.sub.L,j according to equation (4) will have the scale S.sub.C+S.sub.IL. Output scale matching based on the control scales will result in the following scale change:
[0077] where S.sub.IL=N.sub.IL.sub.C, as derived below. Therefore, the adaptive scale for all elements in L is S.sub.L=N.sub.L+.sub.C.
[0078] In deriving the adaptive scale S.sub.IL of I.sub.L, for computation of
the scale of
To match the specified output scale and obtain the adaptive output true scale S.sub.IL, the following scale changes are performed.
[0079] The FW substitution circuit 704 follows the Cholesky decomposition circuit 702 and solves p=Lz for z, where z=L.sup.Hx, using the outputs of the Cholesky decomposition, L and I.sub.L, according to the following equation:
[0080] where z has the same provided control scale as p, i.e., N.sub.z=N.sub.p. Satisfying the conditions requiring matching scales of variables for performing operations or matching the specified output scale in the FW substitution operation results in the true scale of z being the adaptive scale S.sub.z=N.sub.p.sub.C. This result is derived below.
[0081] In deriving the adaptive scale S.sub.Z of z, equation (5) can be expressed as z.sub.i=I.sub.L, ip.sub.i for i=1. The scale of I.sub.L, ip.sub.i is S.sub.p+S.sub.IL. After output scale matching, this becomes the adaptive output true scale S.sub.z:
[0082] For computation of z.sub.i for i1 using equation (5), first p.sub.i and L.sub.i,k*z.sub.k must have matching scales to perform the operation p.sub.i.sub.k=1.sup.i1L.sub.i,k*z.sub.k. The true scale of L.sub.i,k*z.sub.k, which is S.sub.L+S.sub.z, is therefore changed to S.sub.L+S.sub.z(N.sub.L+N.sub.zN.sub.p) after scale matching to the true scale of p.sub.i, which is S.sub.p, based on the control scales N.sub.L, N.sub.z, and N.sub.p. Using the previously obtained values of S.sub.L=N.sub.L+.sub.C and S.sub.z=N.sub.p.sub.C, and remembering that S.sub.p=N.sub.p=N.sub.z, it can be confirmed that:
[0083] Then, for computation of I.sub.L, i(p.sub.i.sub.k=1.sup.i1L.sub.i,k*z.sub.k) according to equation (5), the scale of p.sub.i.sub.k=1.sup.i1L.sub.i,k*z.sub.k is S.sub.p, and thus the result of multiplying p.sub.i.sub.k=1.sup.i1L.sub.i,k*z.sub.k by I.sub.L, i will have the scale S.sub.p+S.sub.IL. Output scale matching based on the control scales will result in the following scale change:
[0084] Therefore, the adaptive scale of all elements in z is S.sub.z=N.sub.p.sub.C.
[0085] The BW substitution circuit 706 in turn solves z=L.sup.Hx for x using the outputs of the Cholesky decomposition circuit 702 and the FW substitution circuit 704 blocks (L, I.sub.L, and z) according to the following equation:
[0086] where x has the same provided control scale as z, i.e., N.sub.x=N.sub.z=N.sub.p. Satisfying the conditions requiring matching scales of variables for performing operations or matching the specified output scale in the BW substitution operation results in the true scale of x being the adaptive scale S.sub.x=N.sub.p2.sub.C, which can also be expressed as S.sub.L+S.sub.IL=N.sub.L+N.sub.IL. This result is derived below.
[0087] In deriving the adaptive scale S.sub.x of x, equation (6) can be expressed as x.sub.i=I.sub.L, iz.sub.i for i=1. The scale of I.sub.L, iz.sub.i is S.sub.z+S.sub.IL. After output scale matching, this becomes the adaptive output true scale S.sub.x:
[0088] For computation of x.sub.i for i1 according to equation (6), first z.sub.i and L.sub.k,ix.sub.k must have matching scales to perform the operation z.sub.i.sub.k=i+1.sup.NL.sub.k,ix.sub.k. The true scale of L.sub.k,ix.sub.k, which is S.sub.L+S.sub.x, is therefore changed to S.sub.L+S.sub.x(N.sub.L+N.sub.xN.sub.z) after scale matching to the true scale of z.sub.i, which is S.sub.z, based on the control scales N.sub.L, N.sub.x, and N.sub.z. Using the previously obtained values of S.sub.L=N.sub.L+.sub.C, S.sub.z=N.sub.p.sub.C, and S.sub.x=N.sub.p2.sub.C, and remembering that S.sub.p=N.sub.p=N.sub.x=N.sub.z, it can be confirmed that:
[0089] Then, for computation of I.sub.L, i(z.sub.i.sub.k=i+1.sup.NL.sub.k,ix.sub.k) according to equation (6), the scale of z.sub.i.sub.k=i+1.sup.NL.sub.k,ix.sub.k is S.sub.z, and thus the result of multiplying z.sub.i.sub.k=i+1.sup.NL.sub.k,ix.sub.k by I.sub.L, i will have the scale S.sub.z+S.sub.IL. Output scale matching based on the control scales will result in the following scale change:
[0090] Therefore, the adaptive scale of all elements in x is S.sub.x=N.sub.p2.sub.C.
[0091] As derived above, the true scales of the outputs of the Cholesky decomposition, FW and BW substitution blocks become different than the control scales and are functions of .sub.C. When .sub.C=0, this embodiment devolves to the conventional method wherein the true scales and control scales have the same value, i.e., S.sub.L=N.sub.L, S.sub.IL=N.sub.IL, and S.sub.x=N.sub.p. In this case, S.sub.L and S.sub.IL are fixed values that do not vary with the input scale S.sub.C and the final output scale S.sub.x is tied to the input scale S.sub.p.
[0092] In the present embodiment with .sub.C0, the primary outputs such as L, I.sub.L, and x have adaptive scales S.sub.L=N.sub.L+.sub.C, S.sub.IL=N.sub.IL.sub.C, and S.sub.x=N.sub.p2.sub.C, which can be exploited to make desirable adjustments to the output scales. The control scales in this case function as anchor points, and .sub.C allows adjustment of the true scales S.sub.L, S.sub.IL, and S.sub.x of the outputs and is determined by both the control input scale N.sub.C and the true input scale S.sub.C (i.e., .sub.C varies with the input scale S.sub.C). Adjustments may be made to the output scales in order to reduce chances of bit overflow and underflow that would occur in the conventional method. This is referred to as the self-tuning property.
[0093] Examples of the benefits provided by a self-tuning fixed-point LS solver follow, in the context of the Cholesky-based LS solver 700 that solves equation (2), p=Cx, for x. For a given input p, the magnitude of x is inversely proportional to the magnitude of C. Likewise, for a given input C, the magnitude of x is inversely proportional to the magnitude of p. For a variable having a given bit width, larger magnitude data needs a smaller scale (as higher integer representation is necessary while less fractional precision is necessary) and smaller magnitude data needs a larger scale (as more fractional precision is necessary while lower integer representation is necessary)i.e., magnitude is inversely proportional to the required scale.
[0094]
[0095] When S.sub.C increases, this means that the magnitude of C has decreased. In the Cholesky decomposition circuit 702, when computing C=LL.sup.H to find L and I.sub.L, a decrease in the magnitude of C means the magnitude of L will decrease and the magnitude of I.sub.L will increase, therefore the required scale for L will increase and the required scale for I.sub.L will decrease (where required scale means the scale needed to avoid underflow and overflow). The embodiments of the present disclosure may accommodate these changes in the required scales for L and I.sub.L due to capability of using adaptive scales S.sub.L and S.sub.IL.
[0096] Following on from the Cholesky decomposition circuit 702, in the FW substitution circuit 704, when computing p=Lz to find z, a decrease in the magnitude of L (and increase in the magnitude of I.sub.L) means the magnitude of z will increase (as the magnitude of z is inversely proportional to the magnitude of L and proportional to the magnitude of I.sub.L), and thus the required scale for z will decrease. Similarly, in the BW substitution circuit 706, when computing z=L.sup.Hx to find x, a decrease in the magnitude of L (and increase in the magnitude of I.sub.L) means the magnitude of x will increase (as the magnitude of x is inversely proportional to the magnitude of L and proportional to the magnitude of I.sub.L), and thus the required scale for x will decrease. The embodiments of the present disclosure may accommodate these changes in the required scales for z and x due to capability of using adaptive scales S.sub.z and S.sub.x.
[0097] By comparison, in the case when .sub.C=0 (i.e., using the conventional method with fixed scales), there will be a higher chance of underflow in the computation of L and a higher chance of overflow in the computation of I.sub.L because S.sub.L and S.sub.IL are fixed (to N.sub.L and N.sub.IL, respectively). Additionally, there will be a higher chance of overflow in the computation of z and x, as S.sub.z and S.sub.x are fixed (to N.sub.p).
[0098]
[0099] In the example of
[0100] The process begins by receiving, at the first fixed-point arithmetic circuit in the sequence, a first input signal having a dynamic true scale that is different from a control scale associated with the first input signal (step 905).
[0101] At step 910 of the process, each of the fixed-point arithmetic circuits determines, for each of the at least one output signals, an adaptive scale from the control scale associated with the output signal based on the true scale of the first input signal and the control scale associated with the first input signal. The adaptive scales are determined at step 910 such that likelihoods of bit underflow and bit overflow are reduced in the generation of the at least one output signal having the adaptive scale of the at least one output signal as compared to a generation of the at least one output signal having the control scale associated with the at least one output signal.
[0102] In some embodiments, each of the fixed-point arithmetic circuits at step 910 determines, for each of the at least one output signals, the adaptive scale from the control scale associated with the output signal by addition or subtraction of a scale tuning factor (e.g., ). For example, each of the fixed-point arithmetic circuits subtracts, for each of the at least one output signals that represents a result of an operation that includes matrix inversion, the scale tuning factor from the control scale associated with the output signal to determine the adaptive scale. Each of the fixed-point arithmetic circuits adds, for each of the at least one output signals that represents a result of an operation that does not include matrix inversion, the scale tuning factor to the control scale associated with the output signal to determine the adaptive scale.
[0103] In such embodiments, a processor operatively coupled to the fixed-point arithmetic circuits may, at step 910, generate the scale tuning factor using the true scale of the first input signal and the control scale associated with the first input signal. In particular, the scale tuning factor may be one half of the difference between the true scale of the first input signal and the control scale associated with the first input signal.
[0104] The process concludes at step 915, where each of the fixed-point arithmetic circuits generates, from the at least one input signal, the at least one output signal having the adaptive scale of the at least one output signal.
[0105] In the process 900 a system of linear equations may be defined by the first input signal (e.g., C) and a second input signal (e.g., p) that is received by one of the fixed-point arithmetic circuits (e.g., the forward substitution circuit), wherein the second input signal has a dynamic true scale. In this case a final fixed-point arithmetic circuit in the sequence (e.g., the backward substitution circuit) generates, as the at least one output signal, a solution to the system of linear equations, and determines the adaptive scale of the solution such that it is different from the true scale of the second input signal.
[0106] In some embodiments of process 900 the first fixed-point arithmetic circuit in the sequence (e.g., the Cholesky decomposition circuit) performs matrix decomposition on the first input signal to generate at least two decomposition matrices as the output signals (e.g., L and I.sub.L). The other fixed-point arithmetic circuits in the sequence then determine the solution to a system of linear equations using the at least two decomposition matrices and the adaptive scales of the at least two decomposition matrices.
[0107] In some cases, the fixed-point arithmetic circuitry also includes a preprocessing circuit that preprocesses inputs to the LS solver circuitry. For example, when the fixed-point arithmetic circuits include a Cholesky decomposition circuit, a forward substitution circuit, and a backward substitution circuit, the preprocessing circuit may receive a matrix y and a matrix A as inputs, where y and A define a system of linear equations y=Ax, and may then generate the first input signal C such that C=A.sup.HA and generate the second input signal p such that p=A.sup.Hy.
[0108] The above flowchart illustrates an example method or process that can be implemented in accordance with the principles of the present disclosure and various changes could be made to the methods or processes illustrated in the flowcharts. For example, while shown as a series of steps, various steps could overlap, occur in parallel, occur in a different order, or occur multiple times. In another example, steps may be omitted or replaced by other steps.
[0109] Although the present disclosure has been described with an exemplary embodiment, various changes and modifications may be suggested to one skilled in the art. It is intended that the present disclosure encompass such changes and modifications as fall within the scope of the appended claims. None of the description in this application should be read as implying that any particular element, step, or function is an essential element that must be included in the claims scope. The scope of patented subject matter is defined by the claims.