G06F7/499

Tracking streaming engine vector predicates to control processor execution

In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.

ROUNDING CIRCUITRY AND METHOD
20170344342 · 2017-11-30 ·

A data processing apparatus for performing rounding on an input value to produce a rounded form output value includes floor calculation circuitry that receives the input value in redundant-representation and generates two candidates of a floor of the input value in non-redundant representation. Ceiling calculation circuitry receives the input value in redundant-representation and generates two candidates of a ceiling of the input value in non-redundant representation. Selection circuitry outputs one of the two candidates of the floor of said input value and the two candidates of the ceiling of said input value as the rounded form output value, based on a sign of a residual value associated with the input value. Each of the two candidates of the floor of the input value correspond with different values of the sign of the residual value and each of the two candidates of the ceiling of said input value correspond with different values of the sign of said residual value.

High performance floating-point adder with full in-line denormal/subnormal support
09830129 · 2017-11-28 · ·

According to one general aspect, an apparatus may include a floating-point addition unit that includes a far path circuit, a close path circuit, and a final result selector circuit. The far path circuit may be configured to compute a far path result based upon either the addition or the subtraction of the two floating point numbers regardless of whether the operands or the result include normal or denormal numbers. The close path circuit may be configured to compute a close path result based upon the subtraction of the two floating point operands regardless of whether the operands or the result include normal or denormal numbers. The final result selector circuit may be configured to select between the far path result and the close path result based, at least in part, upon an amount of difference in the exponent portions of the two floating point operands.

DATA PATH FOR SCALABLE MATRIX NODE ENGINE WITH MIXED DATA FORMATS
20230177108 · 2023-06-08 ·

A microprocessor system comprises a matrix computational unit and a control unit. The matrix computational unit includes a plurality of processing elements. The control unit is configured to provide a matrix processor instruction to the matrix computational unit. The matrix processor instruction specifies a floating-point operand formatted using a first floating-point representation format. The matrix computational unit accumulates an intermediate result value calculated using the floating-point operand. The intermediate result value is in a second floating-point representation format.

Method and apparatus for permuting streamed data elements

A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.

Method and apparatus for implied bit handling in floating point multiplication

A method is provided that includes performing, by a processor in response to a floating point multiply instruction, multiplication of floating point numbers, wherein determination of values of implied bits of leading bit encoded mantissas of the floating point numbers is performed in parallel with multiplication of the encoded mantissas, and storing, by the processor, a result of the floating point multiply instruction in a storage location indicated by the floating point multiply instruction.

Addition method, semiconductor device, and electronic device

An adder circuit inhibiting overflow is provided. A first memory, a second memory, a third memory, and a fourth memory are included. A step of supplying first data with a sign to the first memory and supplying the first data with a positive sign stored in the first memory, to the second memory; a step of supplying the first data with a negative sign stored in the second memory, to the third memory; a step of generating second data by adding the first data with a positive sign stored in the second memory and the first data with a negative sign stored in the third memory; and a step of storing the second data in the fourth memory are included. When the second data stored in the fourth memory are all second data with a positive sign or all second data with a negative sign, all the second data stored in the fourth memory are added.

Computing accelerator using a lookup table

A computing accelerator using a lookup table. The accelerator may accelerate floating point multiplications by retrieving the fraction portion of the product of two floating-point operands from a lookup table, or by retrieving the product of two floating-point operands of two floating-point operands from a lookup table, or it may retrieve dot products of floating point vectors from a lookup table. The accelerator may be implemented in a three-dimensional memory assembly. It may use approximation, the symmetry of a multiplication lookup table, and zero-skipping to improve performance.

LOGARITHM AND POWER (EXPONENTIATION) COMPUTATIONS USING MODERN COMPUTER ARCHITECTURES

Embodiments of the present invention may provide the capability to evaluate logarithm and power (exponentiation) functions using either hardware specific instructions, or a hardware specific implementation with reduced memory requirements. An input comprising a floating point representation of a real number may be received and a mantissa and an exponent may be extracted. A function of a logarithm of a mantissa of the real number may be approximated by utilizing a polynomial based on the mantissa. The approximated function of the logarithm may be combined with the exponent for calculating a value comprising a logarithm of the real number. Likewise, an input comprising a floating point representation of a real number and a representation of a second number may be received and an approximation of the real number to the power of the second number may be generated.

METHODS AND SYSTEMS OF OPERATING A NEURAL CIRCUIT IN A NON-VOLATILE MEMORY BASED NEURAL-ARRAY
20230177319 · 2023-06-08 ·

In one aspect, a method of a neuron circuit includes the step of providing a plurality of 2.sup.N−1 single-level-cell (SLC) flash cells for each synapse (Y.sub.i) connected to a bit line forming a neuron. The method includes the step of providing an input vector (X.sub.i) for each synapse Y.sub.i wherein each input vector is translated into an equivalent electrical signal ES.sub.i (current I.sub.DACi, pulse T.sub.PULSEi, etc). The method includes the step of providing an input current to each synapse sub-circuit varying from 2.sup.0*ES.sub.i to (2.sup.N−1)*ES.sub.i. The method includes the step of providing a set of weight vectors or synapse (Y.sub.i), wherein each weight vector is translated into an equivalent threshold voltage level or resistance level to be stored in one of many non-volatile memory cells assigned to each synapse (Y.sub.i). The method includes the step of providing for 2.sup.N possible threshold voltage levels or resistance levels in the 2.sup.N−1 non-volatile memory cells of each synapse, wherein each cell is configured to store one of the two possible threshold voltage levels. The method includes the step of converting the N digital bits of the weight vector or synapse Y.sub.i into equivalent threshold voltage level and store the appropriate cell corresponding to that threshold voltage level in one of the many SLC cells assigned to the weight vector or synapse (Y.sub.i). The method includes the step of turning off all remaining 2.sup.N−1 flash cells of the respective synapse (Y.sub.i).

Various other methods are presented of forming neuron circuits by providing a plurality of single-level-cell (SLC) and many-level-cell (MLC) non-volatile memory cells, for each synapse (Y.sub.i) electrically connected to form a neuron. The disclosure shows methods of forming neurons in various configurations for non-volatile memory cells (flash, RRAM etc.); of different storage capabilities per cell—both SLC and MLC cells.