Patent classifications
G06F7/487
PROCESSING IN FLOATING POINT FUSED MULTIPLY-ADD OPERATIONS
Improved processing in floating point fused multiply-add operations is described. An example of an apparatus includes a processor including circuitry to perform a floating point fused multiply-add (FMA) instruction, the FMA instruction requesting a calculation including multiplication of a first factor with a second factor to generate a product and addition of an addend to the product to generate a result; wherein, upon receiving the FMA instruction, the processor is to determine a shift to be applied in the calculation for the FMA instruction; determine whether a modified operation is applicable to the calculation for the FMA instruction, the determination being based at least in part on the determined shift to be applied in the calculation; and upon determining that the modified operation is applicable to the calculation for the FMA instruction, perform the modified operation to generate the result for the FMA instruction.
EFFICIENT CIRCUIT FOR NEURAL NETWORK PROCESSING
A system and method for efficient processing for neural network inference operations. In some embodiments, the system includes: a circuit configured to multiply a first number by a second number, the first number being represented as: a sign bit five exponent bits, and seven mantissa bits, representing an eight-bit full mantissa.
FLOATING-POINT COMPUTATION APPARATUS AND METHOD USING COMPUTING-IN-MEMORY
Disclosed herein are a floating-point computation apparatus and method using Computing-in-Memory (CIM). The floating-point computation apparatus performs a multiply-and-accumulation operation on pieces of input neuron data represented in a floating-point format, and includes a data preprocessing unit configured to separate and extract an exponent and a mantissa from each of the pieces of input neuron data, an exponent processing unit configured to perform CIM on input neuron exponents, which are exponents separated and extracted from the input neuron data, and a mantissa processing unit configured to perform a high-speed computation on input neuron mantissas, separated and extracted from the input neuron data, wherein the exponent processing unit determines a mantissa shift size for a mantissa computation and transfers the mantissa shift size to the mantissa processing unit, and the mantissa processing unit normalizes a result of the mantissa computation and transfers a normalization value to the exponent processing unit.
ARITHMETIC PROCESSING APPARATUS AND ARITHMETIC PROCESSING METHOD
An arithmetic processing apparatus includes a processor. The processor is configured to execute a parallel calculation on a plurality of pieces of floating-point data; determine whether or not information loss is to occur in the parallel calculation; and output a result of the parallel calculation when it is determined that the information loss is not to occur, and execute a sequential calculation on the plurality of pieces of floating-point data to output the result of the sequential calculation when it is determined that the information loss is to occur.
Method and apparatus for vector based finite impulse response (FIR) filtering
A method is provided that includes performing, by a processor in response to a vector finite impulse response (VFIR) filter instruction, generating of a plurality of filter outputs using a plurality of coefficients and a plurality of sequential data elements, the plurality of coefficients specified by a coefficient operand of the VFIR filter instruction and the plurality of sequential data elements specified by a data operand of the VFIR filter instruction, and storing the filter outputs in a storage location specified by the VFIR filter instruction.
Method and apparatus for vector based finite impulse response (FIR) filtering
A method is provided that includes performing, by a processor in response to a vector finite impulse response (VFIR) filter instruction, generating of a plurality of filter outputs using a plurality of coefficients and a plurality of sequential data elements, the plurality of coefficients specified by a coefficient operand of the VFIR filter instruction and the plurality of sequential data elements specified by a data operand of the VFIR filter instruction, and storing the filter outputs in a storage location specified by the VFIR filter instruction.
Tracking streaming engine vector predicates to control processor execution
In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.
Tracking streaming engine vector predicates to control processor execution
In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.
Approach to power reduction in floating-point operations
An approach is provided for enabling power reduction in floating-point operations. In one example, a system receives floating-point numbers of a fused multiply-add instruction. The system determines the fused multiply-add instruction does not require compliance with a standard of precision for floating-point numbers. The system generates gating signals for an integrated circuit that is configured to perform operations of the fused multiply-add instruction. The system then sends the gating signals to the integrated circuit to turn off a plurality of logic gates included in the integrated circuit.
Method and apparatus for permuting streamed data elements
A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.