G06F7/49915

METHOD AND APPARATUS FOR VECTOR SORTING USING VECTOR PERMUTATION LOGIC
20230037321 · 2023-02-09 ·

A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, generating a control input vector for vector permutation logic comprised in the processor based on values in lanes of the vector and a sort order for the vector indicated by the vector sort instruction and storing the control input vector in a storage location.

Data path for scalable matrix node engine with mixed data formats
11556615 · 2023-01-17 · ·

A microprocessor system comprises a matrix computational unit and a control unit. The matrix computational unit includes a plurality of processing elements. The control unit is configured to provide a matrix processor instruction to the matrix computational unit. The matrix processor instruction specifies a floating-point operand formatted using a first floating-point representation format. The matrix computational unit accumulates an intermediate result value calculated using the floating-point operand. The intermediate result value is in a second floating-point representation format.

Acceleration circuitry

Systems, apparatuses, and methods related to acceleration circuitry are described. The acceleration circuitry may be deployed in a memory device and can include a memory resource and/or logic circuitry. The acceleration circuitry can perform operations on data to convert the data between one or more numeric formats, such as floating-point and/or universal number (e.g., posit) formats. The acceleration circuitry can perform arithmetic and/or logical operations on the data after the data has been converted to a particular format. For instance, the memory resource can receive data comprising a bit string having a first format that provides a first level of precision. The logic circuitry can receive the data from the memory resource and convert the bit string to a second format that provides a second level of precision that is different from the first level of precision.

METHOD AND APPARATUS TO SORT A VECTOR FOR A BITONIC SORTING ALGORITHM
20230229448 · 2023-07-20 ·

A method is provided that includes performing, by a processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values in a first portion of the lanes are sorted in a first order indicated by the vector sort instruction and the values in a second portion of the lanes are sorted in a second order indicated by the vector sort instruction; and storing the sorted vector in a storage location.

Accumulating data values and storing in first and second storage devices

Herein described is a method of operating an accumulation process in a data processing apparatus. The accumulation process comprises a plurality of accumulations which output a respective plurality of accumulated values, each based on a stored value and a computed value generated by a data processing operation. The method comprises storing a first accumulated value, the first accumulated value being one of said plurality of accumulated values, into a first storage device comprising a plurality of single-bit storage elements; determining that a predetermined trigger has been satisfied with respect to the accumulation process; and in response to the determining, storing at least a portion of a second accumulated value, the second accumulated value being one of said plurality of accumulated values, into a second storage device.

Prepare for shorter precision (round for reround) mode in a decimal floating-point instruction

An instruction is executed in round-for-reround mode wherein the permissible resultant value that is closest to and no greater in magnitude than the infinitely precise result is selected. If the selected value is not exact and the units digit of the selected value is either 0 or 5, then the digit is incremented by one and the selected value is delivered. In all other cases, the selected value is delivered.

FLOATING POINT FUSED MULTIPLY ADD WITH MERGED 2'S COMPLEMENT AND ROUNDING
20230214179 · 2023-07-06 ·

A method includes receiving an unrounded mantissa value and a round bit associated with the unrounded mantissa value. The method also includes receiving a 2's complement signal that indicates whether the unrounded mantissa value results from a 1's complement operation. The method includes incrementing the unrounded mantissa value to provide an incremented value. The unrounded mantissa value is a non-incremented value. The method further includes providing one of the incremented value or non-incremented value as a rounded mantissa value responsive to the 2's complement signal.

FLOATING POINT FUSED MULTIPLY ADD WITH REDUCED 1'S COMPLEMENT DELAY
20230214181 · 2023-07-06 ·

A method includes receiving a carry-sum value corresponding to a first portion of inputs to an adder, and receiving a second value corresponding to a second portion of inputs to the adder that do not overlap the first portion. Method includes providing an intermediate sum of carry and sum values of the carry-sum value, which generates a carry out (Cout). Method includes determining a sign of incremented second value, and a sign of non-incremented second value; complementing or passing, responsive to sign of incremented result, the incremented result as a first output; complementing or passing, responsive to sign of non-incremented result, the non-incremented result as a second output; complementing or passing, responsive to Cout, sign of incremented result, and sign of non-incremented result, the intermediate sum as a third output; selecting one of the first, second outputs responsive to Cout; and providing final sum comprising third output and selected output.

ARITHMETIC DEVICE, METHOD, AND PROGRAM
20220382544 · 2022-12-01 · ·

A processor determines an exponent common to a plurality of numerical values, determines a mantissa for each of the plurality of numerical values based on the determined exponent, and performs four arithmetic operations using a sign, the determined exponent, and the determined mantissa.

Histogram-based per-layer data format selection for hardware implementation of deep neural network

A histogram-based method of selecting a fixed point number format for representing a set of values input to, or output from, a layer of a Deep Neural Network (DNN). The method comprises obtaining a histogram that represents an expected distribution of the set of values of the layer, each bin of the histogram is associated with a frequency value and a representative value in a floating point number format; quantising the representative values according to each of a plurality of potential fixed point number formats; estimating, for each of the plurality of potential fixed point number formats, the total quantisation error based on the frequency values of the histogram and a distance value for each bin that is based on the quantisation of the representative value for that bin; and selecting the fixed point number format associated with the smallest estimated total quantisation error as the optimum fixed point number format for representing the set of values of the layer.