Patent classifications
G06F7/499
METHOD AND APPARATUS FOR VECTOR SORTING USING VECTOR PERMUTATION LOGIC
A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, generating a control input vector for vector permutation logic comprised in the processor based on values in lanes of the vector and a sort order for the vector indicated by the vector sort instruction and storing the control input vector in a storage location.
RELIABLE SUPERVISED MACHINE LEARNING USING INTERVAL ARITHMETIC
An interval arithmetic based system and method for solving a global optimization problem is contemplated and provided. Provisions are made for a bisection indexing scheme for a parameter domain of an objective function wherein unique codes are assigned to each iterative interval subset of the parameter domain. Relationships for, between and among iterative subdivisions are arithmetically delimited, with unique codes populating an integer field of a bisection queue system memory component for particular arrays in a bisection context system memory component, and a further integer array of the bisection context. In connection to depth first domain bisection, operations are undertaken relative to the bisection context which are memorialized in relation to the bisection queue, operations which include a work stealing scheme for simultaneous breadth first searching.
Data path for scalable matrix node engine with mixed data formats
A microprocessor system comprises a matrix computational unit and a control unit. The matrix computational unit includes a plurality of processing elements. The control unit is configured to provide a matrix processor instruction to the matrix computational unit. The matrix processor instruction specifies a floating-point operand formatted using a first floating-point representation format. The matrix computational unit accumulates an intermediate result value calculated using the floating-point operand. The intermediate result value is in a second floating-point representation format.
Method and apparatus with neural network parameter quantization
Provided is a processor implemented method that includes performing training or an inference operation with a neural network by obtaining a parameter for the neural network in a floating-point format, applying a fractional length of a fixed-point format to the parameter in the floating-point format, performing an operation with an integer arithmetic logic unit (ALU) to determine whether to round off a fixed point based on a most significant bit among bit values to be discarded after a quantization process, and performing an operation of quantizing the parameter in the floating-point format to a parameter in the fixed-point format, based on a result of the operation with the ALU.
APPARATUS AND METHOD FOR VECTOR PACKED DUAL COMPLEX-BY-COMPLEX AND DUAL COMPLEX-BY-COMPLEX CONJUGATE MULTIPLICATION
An apparatus and method for multiplying packed real and imaginary components of complex numbers and complex conjugates. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; and execution circuitry to execute the decoded instruction. The execution circuitry includes multiplier circuitry to multiply select real and imaginary data elements in the first and second source registers to generate a plurality of real and imaginary products; adder circuitry to add/subtract various real and imaginary products, scale the results according to an immediate of the instruction, round the scaled results; and saturation circuitry to saturate the rounded results.
APPARATUS AND METHOD FOR VECTOR PACKED MULTIPLY OF SIGNED AND UNSIGNED WORDS
An apparatus and method for performing a vector packed multiplication of signed and unsigned words. For example, one embodiment of a processor includes a decoder to decode a vector packed multiply instruction having operands to identify a first and a second plurality of packed words, first and second source registers to store the first and second plurality of packed words, and execution circuitry to execute the decoded instruction. The execution circuitry includes multiplier circuitry to multiply each packed word in the first source register with a corresponding packed word in the second source register to generate a plurality of doubleword products and rounding circuitry to round each of the doubleword products according to a rounding method to generate a plurality of rounded doubleword products. Each upper word of the rounded doubleword results is then stored into a corresponding word data element positions of a destination register.
APPARATUS AND METHOD FOR VECTOR PACKED MULTIPLY OF SIGNED AND UNSIGNED WORDS
An apparatus and method for performing a vector packed multiplication of signed and unsigned words. For example, one embodiment of a processor includes a decoder to decode a vector packed multiply instruction having operands to identify a first and a second plurality of packed words, first and second source registers to store the first and second plurality of packed words, and execution circuitry to execute the decoded instruction. The execution circuitry includes multiplier circuitry to multiply each packed word in the first source register with a corresponding packed word in the second source register to generate a plurality of doubleword products and rounding circuitry to round each of the doubleword products according to a rounding method to generate a plurality of rounded doubleword products. Each upper word of the rounded doubleword results is then stored into a corresponding word data element positions of a destination register.
STOCHASTIC ROUNDING FOR NEURAL PROCESSOR CIRCUIT
Embodiments relate to a neural processor circuit that includes a neural engine and a post-processing circuit. The neural engine performs a computational task related to a neural network to generate a processed value. The post-processing circuit includes a random bit generator, an adder circuit and a rounding circuit. The random bit generator generates a random string of bits. The adder circuit adds the random string of bits to a version of the processed value to generate an added value. The rounding circuit truncates the added value to generate an output value of the computational task. The random bit generator may include a linear-feedback shift register (LFSR) that generates random numbers based on a seed. The seed may be derived from a master seed that is specific to a task of the neural network.
Acceleration circuitry
Systems, apparatuses, and methods related to acceleration circuitry are described. The acceleration circuitry may be deployed in a memory device and can include a memory resource and/or logic circuitry. The acceleration circuitry can perform operations on data to convert the data between one or more numeric formats, such as floating-point and/or universal number (e.g., posit) formats. The acceleration circuitry can perform arithmetic and/or logical operations on the data after the data has been converted to a particular format. For instance, the memory resource can receive data comprising a bit string having a first format that provides a first level of precision. The logic circuitry can receive the data from the memory resource and convert the bit string to a second format that provides a second level of precision that is different from the first level of precision.
MULTIPLICATION BY A RATIONAL IN HARDWARE WITH SELECTABLE ROUNDING MODE
A fixed logic circuit for performing multiplication of an input x by a constant rational p/q so as to calculate an output y according to a directed rounding or round-to-nearest rounding mode. Fixed logic hardware is derived comprising an addition array configured to operate on canonical signed digit (CSD) forms of binary values (a CSD array) so as to form an approximation of a multiplication of an input x [m−1:0] by a rational p/q. A truncated summation array of a finite sequence of most significant bits of an infinite CSD expansion of the rational p/q operating on the bits of the input x satisfies
Registers define a plurality of corrective constants for a respective plurality of rounding modes, and selection logic selects the respective corrective constant for that rounding mode in dependence on a rounding mode in which the truncated summation array is to operate.