Patent classifications
G06F7/487
PROCESSING-IN-MEMORY(PIM) DEVICE
A PIM device includes a memory/arithmetic region including a plurality of memory banks and a plurality of MAC operators, the plurality of MAC operators including a first MAC operator, a peripheral region including a data input/output circuit, and a global data input/output (GIO) line capable of providing a data transmission path between the peripheral region and the memory/arithmetic region. The first MAC operator is configured to perform an EWM operation by performing a multiplication operation on first input data and second input data that are transmitted from first and second memory banks of the plurality of memory banks, respectively, to generate multiplication result data and transmitting the multiplication result data to a third memory bank. While the EWM operation is being performed, data transmission through the GIO line between the peripheral region and the memory/arithmetic region is blocked.
METHOD AND APPARATUS TO SORT A VECTOR FOR A BITONIC SORTING ALGORITHM
A method is provided that includes performing, by a processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values in a first portion of the lanes are sorted in a first order indicated by the vector sort instruction and the values in a second portion of the lanes are sorted in a second order indicated by the vector sort instruction; and storing the sorted vector in a storage location.
DATA PROCESSING SYSTEM, OPERATING METHOD THEREOF, AND COMPUTING SYSTEM USING THE SAME
A data processing system may include: a matrix splitting circuit configured to: split the matrix into a positive matrix and a negative matrix, and store the positive matrix and the negative matrix in a first sub array and a second sub array within the computation memory, respectively; a vector conversion circuit configured to generate an offset vector by adding, to elements within the vector, an offset for converting a negative element, which has a largest absolute value among the elements within the vector, into a zero element or a positive element, and apply the offset vector to the row lines of the first sub array and the second sub array; and an offset correction circuit configured to generate an offset correction value by subtracting a result of multiplying the offset and the negative matrix from a result of multiplying the offset and the positive matrix, and subtract the offset correction value from a computation value outputted from the first sub array and the second sub array,
High-precision anchored-implicit processing
An apparatus includes a processing circuit and a storage device. The processing circuit is configured to perform one or more processing operations in response to one or more instructions to generate an anchored-data element. The storage device is configured to store the anchored-data element. A format of the anchored-data element includes an identification item, an overlap item, and a data item. The data item is configured to hold a data value of the anchored-data element. The identification item indicates an anchor value for the data value or one or more special values.
High-precision anchored-implicit processing
An apparatus includes a processing circuit and a storage device. The processing circuit is configured to perform one or more processing operations in response to one or more instructions to generate an anchored-data element. The storage device is configured to store the anchored-data element. A format of the anchored-data element includes an identification item, an overlap item, and a data item. The data item is configured to hold a data value of the anchored-data element. The identification item indicates an anchor value for the data value or one or more special values.
COMPUTER PROCESSOR FOR HIGHER PRECISION COMPUTATIONS USING A MIXED-PRECISION DECOMPOSITION OF OPERATIONS
Embodiments detailed herein relate to arithmetic operations of float-point values. An exemplary processor includes decoding circuitry to decode an instruction, where the instruction specifies locations of a plurality of operands, values of which being in a floating-point format. The exemplary processor further includes execution circuitry to execute the decoded instruction, where the execution includes to: convert the values for each operand, each value being converted into a plurality of lower precision values, where an exponent is to be stored for each operand; perform arithmetic operations among lower precision values converted from values for the plurality of the operands; and generate a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and store the floating-point value.
Methods to compress range doppler map (RDM) values from floating point to decibels (dB)
Embodiments of a telemetry device and methods to convert a binary floating point number to a compressed number is described herein. The binary floating point number may comprise a mantissa and an exponent. The telemetry device may determine a first number based on a product of the exponent and a constant, wherein the constant may be proportional to a logarithm of the number two. The telemetry device may determine a second number using one or more bits of the mantissa as an index into a predetermined lookup table. Values of the lookup table may be proportional to logarithms of candidate mantissa values. The telemetry device may determine the compressed number based on rounding of a sum. The sum may include the first and second numbers. The rounding may be based on a predetermined step size.
SYSTOLIC ARRAY WITH INPUT REDUCTION TO MULTIPLE REDUCED INPUTS
Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reducer can receive a particular input and generate multiple reduced inputs from the input. The reduced inputs can include reduced input data elements and/or a reduced weights. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide multiple reduced inputs with second shorter bit-length to the array. The systolic array may perform multiply-accumulate operations on each unique combination of the multiple reduced input data elements and the reduced weights to generate multiple partial outputs. The systolic array may sum the partial outputs to generate the output.
OPERATING METHOD OF FLOATING POINT OPERATION CIRCUIT AND INTEGRATED CIRCUIT INCLUDING FLOATING POINT OPERATION CIRCUIT
An operating method of a floating point operation circuit includes, in response to receiving a first instruction, generating a first output by performing a fused multiplication and addition operation on a first input, a second input, and a third input. The method further includes, in response to receiving a second instruction, generating a second output by inverting one input of a fourth input, a fifth input, and a sixth input. Generating the second output includes generating a transform factor and a simplified value from the one input.
OPERATING METHOD OF FLOATING POINT OPERATION CIRCUIT AND INTEGRATED CIRCUIT INCLUDING FLOATING POINT OPERATION CIRCUIT
An operating method of a floating point operation circuit includes, in response to receiving a first instruction, generating a first output by performing a fused multiplication and addition operation on a first input, a second input, and a third input. The method further includes, in response to receiving a second instruction, generating a second output by inverting one input of a fourth input, a fifth input, and a sixth input. Generating the second output includes generating a transform factor and a simplified value from the one input.