G06F7/552

Circuitry for floating-point power function

Techniques are disclosed relating to floating-point circuitry configured to perform a corner check instruction for a floating-point power operation. In some embodiments, the power operation is performed by executing multiple instructions, including one or more instructions specify to generate an initial power result of a first input raised to the power of a second input as 2.sup.(second input*log.sup.2.sup.(first input)). In some embodiments, the corner check instruction operates on the first and second inputs and outputs output a corrected power result based on detection of a corner condition for the first and second inputs. Corner check circuitry may share circuits with other datapaths. In various embodiments, the disclosed techniques may reduce code size and power consumption for the power operation.

Execution unit for evaluating functions using newton raphson iterations
11340868 · 2022-05-24 · ·

An execution unit for a processor, the execution unit comprising: a look up table having a plurality of entries, each of the plurality of entries comprising an initial estimate for a result of an operation; a preparatory circuit configured to search the look up table using an index value dependent upon the operand to locate an entry comprising a first initial estimate for a result of the operation; a plurality of processing circuits comprising at least one multiplier circuit; and control circuitry configured to provide the first initial estimate to the at least one multiplier circuit of the plurality of processing circuits so as perform processing, by the plurality of processing units, of the first initial estimate to generate the function result, said processing comprising applying one or more Newton Raphson iterations to the first initial estimate.

USE OF A SINGLE INSTRUCTION SET ARCHITECTURE (ISA) INSTRUCTION FOR VECTOR NORMALIZATION

Embodiments described herein are generally directed to an improved vector normalization instruction. An embodiment of a method includes responsive to receipt by a GPU of a single instruction specifying a vector normalization operation to be performed on V vectors: (i) generating V squared length values, N at a time, by a first processing unit, by, for each N sets of inputs, each representing multiple component vectors for N of the vectors, performing N parallel dot product operations on the N sets of inputs. Generating V sets of outputs representing multiple normalized component vectors of the V vectors, N at a time, by a second processing unit, by, for each N squared length values of the V squared length values, performing N parallel operations on the N squared length values, wherein each of the N parallel operations implement a combination of a reciprocal square root function and a vector scaling function.

DATA NORMALIZATION PROCESSING METHOD, STORAGE MEDIUM AND COMPUTER EQUIPMENT
20220027126 · 2022-01-27 ·

The present disclosure provides a data normalization processing method, a storage medium, and a computer device. According to the technical solution provided in the present disclosure, by adopting the method of data scaling and operator splicing, the input data in the deep learning neural network is normalized, which reduces the complexity of the normalization operation in the existing deep learning neural network, effectively prevents the data overflow in the process of data processing, and improves the operation speed of the deep learning neural network.

DATA NORMALIZATION PROCESSING METHOD, STORAGE MEDIUM AND COMPUTER EQUIPMENT
20220027126 · 2022-01-27 ·

The present disclosure provides a data normalization processing method, a storage medium, and a computer device. According to the technical solution provided in the present disclosure, by adopting the method of data scaling and operator splicing, the input data in the deep learning neural network is normalized, which reduces the complexity of the normalization operation in the existing deep learning neural network, effectively prevents the data overflow in the process of data processing, and improves the operation speed of the deep learning neural network.

DEEP NEURAL NETWORK HARDWARE ACCELERATOR BASED ON POWER EXPONENTIAL QUANTIZATION

A deep neural network hardware accelerator comprises: an AXI-4 bus interface, an input cache area, an output cache area, a weighting cache area, a weighting index cache area, an encoding module, a configurable state controller module, and a PE array. The input cache area and the output cache area are designed as a line cache structure; an encoder encodes weightings according to an ordered quantization set, the quantization set storing the possible value of the absolute value of all of the weightings after quantization. During the calculation of the accelerator, the PE unit reads data from the input cache area and the weighting index cache area to perform shift calculation, and sends the calculation result to the output cache area. The accelerator uses shift operations to replace floating point multiplication operations, reducing the requirements for computing resources, storage resources, and communication bandwidth, and increasing the calculation efficiency of the accelerator.

DEEP NEURAL NETWORK HARDWARE ACCELERATOR BASED ON POWER EXPONENTIAL QUANTIZATION

A deep neural network hardware accelerator comprises: an AXI-4 bus interface, an input cache area, an output cache area, a weighting cache area, a weighting index cache area, an encoding module, a configurable state controller module, and a PE array. The input cache area and the output cache area are designed as a line cache structure; an encoder encodes weightings according to an ordered quantization set, the quantization set storing the possible value of the absolute value of all of the weightings after quantization. During the calculation of the accelerator, the PE unit reads data from the input cache area and the weighting index cache area to perform shift calculation, and sends the calculation result to the output cache area. The accelerator uses shift operations to replace floating point multiplication operations, reducing the requirements for computing resources, storage resources, and communication bandwidth, and increasing the calculation efficiency of the accelerator.

Secure computation system, secure computation device, secure computation method, and program

A secure computation technique of calculating a polynomial in a shorter calculation time is provided. A secure computation system generates concealed text [[u]] of u, which is the result of magnitude comparison between a value x and a random number r, from concealed text [[x]] by using concealed text [[r]]; generates concealed text [[c]] of a mask c from the concealed text [[x]], [[r]], and [[u]]; reconstructs the mask c from the concealed text [[c]]; calculates, for i=0, . . . , n, a coefficient b.sub.i from an order n, coefficients a.sub.0, a.sub.1, . . . , a.sub.n, and the mask c; generates, for i=1, . . . , n, concealed text [[s.sub.i]] of a selected value s.sub.i, which is determined in accordance with the result u of magnitude comparison, from the concealed text; [[u]]; and calculates a linear combination b.sub.0+b.sub.1[[s.sub.1]]+ . . . +b.sub.n[[s.sub.n]] of the coefficient b.sub.i and the concealed text [[s.sub.i]] as concealed text [[a.sub.0+a.sub.1x.sup.1+ . . . +a.sub.nx.sup.n]].

Iterative Estimation Hardware
20220391205 · 2022-12-08 ·

A function estimation hardware logic unit may be implemented as part of an execution pipeline in a processor. The function estimation hardware logic unit is arranged to calculate, in hardware logic, an improved estimate of a function of an input value, d, where the function is given by

[00001] 1 / d i .

The hardware logic comprises a plurality of multipliers and adders arranged to implement a m.sup.th-order polynomial with coefficients that are rational numbers, where m is not equal to two and in various examples m is not equal to a power of two. In various examples i=1, i=2 or i=3. In various examples m=3.

Neural network learning apparatus for deep learning and method thereof
11803744 · 2023-10-31 · ·

Disclosed is a neural network learning apparatus for deep learning and a method thereof. A neural network learning apparatus for deep learning according to an embodiment of the present disclosure includes an input interface, a memory, and a learning processor for applying a Gradient Descent algorithm to a neural network model, and the learning processor may transform a cumulative change function of the gradient for an error function into an inverse square root function in the Gradient Descent algorithm, and operate an inverse square root approximate value by using a Newton-Raphson method for the transformed inverse square root function. The neural network learning apparatus for deep learning of the present disclosure may be connected or converged with an Artificial Intelligence module, an Unmanned Aerial Vehicle (UAV), a robot, an Augmented Reality (AR) apparatus, a Virtual Reality (VR), or a 5G network service-related apparatus, etc.