G06F2207/5523

Hybrid accumulation method in multiply-accumulate for machine learning
11615256 · 2023-03-28 ·

Methods for performing mixed-mode Multiply-Accumulate (MAC) functions in an integrated circuit (IC) are disclosed. By performing part of the MAC operation spatially and in parallel, and part of it temporally and serially, the number of MAC operations can be programmed in the serial/temporal MAC segment as a multiple of the parallel/spatial MAC segment. Such a trait provides a degree of flexibility in programming the mixed-mode MAC function. A Programmable-Hybrid-Accumulation (PHA) method, performs the accumulation function of the MAC IC, by transforming the accumulation signal to a hybrid accumulation signal. The hybrid accumulation signal is comprised of a Most-Significant-Portion (MSP) and a Least-Significant-Portion (LSP), wherein the portions of the hybrid accumulation signal can be programmed in accordance with cost-performance objectives of an end application. Transforming the accumulated signal to a hybrid signal, and utilizing the PHA method, enables keeping the signal magnitudes bounded which prevent signal over-flow constraints while accumulation cycles proceed. Arranging a mixed-signal MAC in accordance with the PHA method can, among other benefits, help to limit the peak-to-peak analog signal swings which enhances performance attributes such as lower current consumption, faster speed, lower power supply voltage, and a wider signal accumulation range before power supply operating head-room conditions are breached.

Digital approximate multipliers for machine learning and artificial intelligence applications
11467805 · 2022-10-11 ·

Digital approximate multipliers (aMULT) utilizing interpolative apparatuses, circuits, and methods are described in this disclosure. The disclosed aMULT interpolative methods can be arranged or programmed to operate asynchronously and or synchronously. For applications where less precision is acceptable, fewer interpolations can yield less precise multiplication results, while such approximate multiplication can be computed faster and at lower power consumption. Conversely, for applications where higher precision is required, more interpolations can generate more precise multiplication results. As such, by utilizing the disclosed aMULT method, the resolution and precision objectives of an approximate multiplication function can be pre-programmed or adjusted real-time and or on the fly, which enables optimizing for different and flexible power consumption and speed of multiplication, in addition to enabling the optimization of an approximate multiplier's die size and cost in accordance with cost-performance objectives.

Evaluating Polynomials in Hardware Logic
20170308355 · 2017-10-26 ·

An accurate implementation of a polynomial using floating-point or other rounded arithmetic can be generated using a plurality of hardware logic components which each implement an input polynomial such that the zeros in the input polynomial can be determined correctly. The number of different hardware logic components that are used can be reduced by analysing the set of input polynomials and from it generating a set of polynomial components, where each polynomial in the set of input polynomials which is not also in the set of polynomial components, can be generated from a single one of the polynomial components.

LOGARITHM AND POWER (EXPONENTIATION) COMPUTATIONS USING MODERN COMPUTER ARCHITECTURES

Embodiments of the present invention may provide the capability to evaluate logarithm and power (exponentiation) functions using either hardware specific instructions, or a hardware specific implementation with reduced memory requirements. An input comprising a floating point representation of a real number may be received and a mantissa and an exponent may be extracted. A function of a logarithm of a mantissa of the real number may be approximated by utilizing a polynomial based on the mantissa. The approximated function of the logarithm may be combined with the exponent for calculating a value comprising a logarithm of the real number. Likewise, an input comprising a floating point representation of a real number and a representation of a second number may be received and an approximation of the real number to the power of the second number may be generated.

COMPUTING PROCESSOR
20170220321 · 2017-08-03 ·

Improved computing processor. In an embodiment, one or more roots of a perturbed polynomial equation, comprising a plurality of terms, are computed, assuming a non-zero coefficient for a highest-order one of the plurality of terms. For at least a highest-order term, an error upper bound of an unperturbed coefficient of the term is computed, it is determined whether a perturbed coefficient of the term is less than or equal to the error upper bound, and, when the perturbed coefficient of the term is less than or equal to the error upper bound, one or more roots of the perturbed polynomial equation are computed, assuming a zero coefficient for the term. Each computed root is added to a root set.

Secure computation system, secure computation device, secure computation method, and program

A secure computation technique of calculating a polynomial in a shorter calculation time is provided. A secure computation system generates concealed text [[u]] of u, which is the result of magnitude comparison between a value x and a random number r, from concealed text [[x]] by using concealed text [[r]]; generates concealed text [[c]] of a mask c from the concealed text [[x]], [[r]], and [[u]]; reconstructs the mask c from the concealed text [[c]]; calculates, for i=0, . . . , n, a coefficient b.sub.i from an order n, coefficients a.sub.0, a.sub.1, . . . , a.sub.n, and the mask c; generates, for i=1, . . . , n, concealed text [[s.sub.i]] of a selected value s.sub.i, which is determined in accordance with the result u of magnitude comparison, from the concealed text; [[u]]; and calculates a linear combination b.sub.0+b.sub.1[[s.sub.1]]+ . . . +b.sub.n[[s.sub.n]] of the coefficient b.sub.i and the concealed text [[s.sub.i]] as concealed text [[a.sub.0+a.sub.1x.sup.1+ . . . +a.sub.nx.sup.n]].

Digital approximate squarer for machine learning
11416218 · 2022-08-16 ·

Digital approximate squarer (aSQR)s utilizing apparatuses, circuits, and methods are described in this disclosure. The disclosed aSQR methods can operate asynchronously and or synchronously. For applications where low precisions is acceptable, fewer interpolations can yield less precise square approximation, which can be computed faster and with lower power consumption. Conversely, for applications where higher precision are required, more interpolations steps can generate more precise square approximation. By utilizing the disclosed aSQR method, precision objectives of a squarer approximation function can be programmed real-time and on the fly, which enables optimizing for power consumption and speed of squaring, in addition to optimize for the approximate squarer's die size and cost.

Multiplier circuit for accelerated square operations

In one embodiment, an apparatus comprises a multiplier circuit to: identify a plurality of partial products associated with a multiply operation; partition the plurality of partial products into a first set of partial products, a second set of partial products, and a third set of partial products; determine whether the multiply operation is associated with a square operation; upon a determination that the multiply operation is associated with the square operation, compute a result based on the first set of partial products and the third set of partial products; and upon a determination that the multiply operation is not associated with the square operation, compute the result based on the first set of partial products, the second set of partial products, and the third set of partial products.

HARDWARE TO PERFORM SQUARING
20240134607 · 2024-04-25 ·

Methods of calculating a square of an input number in hardware logic are described. An m-bit number is received and Booth encoding is performed on different groups of three consecutive bits selected from the input to generate an encoded value for each of the groups. For each group, the method comprises forming a truncated string from the input number, generating an updated version of the truncated number and selecting a bit string based on the encoded value, the selected bit string comprising zeros or a left-shifted version of the updated version of the truncated number sign extended to a bit-width of 2m bits. The method further comprises combining the selected bit strings and square and sign bits for each group into an addition array; and summing the bits in the addition array.

EFFICIENT FLOATING POINT SQUARER
20240134602 · 2024-04-25 ·

Methods of squaring, in hardware logic, a floating point number comprising an m-bit input exponent and an input mantissa comprise generating a candidate mantissa output, in mantissa hardware logic, by squaring the input mantissa and generating, in exponent and exception logic, three candidate exponent outputs. The three candidate exponent outputs comprise (i) an exceptional exponent output, (ii) an exponent output generated from the m-bit input exponent and (iii) an incremental exponent generated by incrementing the exponent output. The method further comprises selecting, as the output mantissa, either the candidate mantissa output or an exceptional mantissa output based on exception signals generated by the exponent and exception logic based on the m-bit input exponent. The method additionally comprises selecting, as an output exponent, one of the three candidate exponent outputs based on the exception signals and based on a signal indicating a mantissa overflow condition.