G06F7/49915

Method and apparatus for vector based matrix multiplication

A method is provided that includes performing, by a processor in response to a vector matrix multiply instruction, multiplying an m×n matrix (A matrix) and a n×p matrix (B matrix) to generate elements of an m×p matrix (R matrix), and storing the elements of the R matrix in a storage location specified by the vector matrix multiply instruction.

Method and apparatus to sort a vector for a bitonic sorting algorithm

A method is provided that includes performing, by a processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values in a first portion of the lanes are sorted in a first order indicated by the vector sort instruction and the values in a second portion of the lanes are sorted in a second order indicated by the vector sort instruction; and storing the sorted vector in a storage location.

Method and apparatus for dual issue multiply instructions

A method is provided that includes performing, by a processor in response to a dual issue multiply instruction, multiplication of operands of the dual issue multiply instruction using multiplication units comprised in a data path of the processor and configured to operate together to determine a product of the operands, and storing, by the processor, the product in a storage location indicated by the dual issue multiply instruction.

Method and apparatus for vector permutation

A method is provided that includes performing, by a processor in response to a vector permutation instruction, permutation of values stored in lanes of a vector to generate a permuted vector, wherein the permutation is responsive to a control storage location storing permute control input for each lane of the permuted vector, wherein the permute control input corresponding to each lane of the permuted vector indicates a value to be stored in the lane of the permuted vector, wherein the permute control input for at least one lane of the permuted vector indicates a value of a selected lane of the vector is to be stored in the at least one lane, and storing the permuted vector in a storage location indicated by an operand of the vector permutation instruction.

Method and apparatus for dual multiplication units in a data path

A processor is provided that includes a first multiplication unit in a first data path of the processor, the first multiplication unit configured to perform single issue multiply instructions, and a second multiplication unit in the first data path, the second multiplication unit configured to perform single issue multiply instructions, wherein the first multiplication unit and the second multiplication unit are configured to execute respective single issue multiply instructions in parallel.

Argmax use for machine learning

A processing device is provided which comprises memory and a processor. The processor is configured to receive an array of floating point numbers each having a plurality of bits used to represent a probability value. For each floating point number, the processor is configured to replace values in a portion of the bits used to represent the probability value with index values to represent an index corresponding to a location of a corresponding floating point number in the memory. The processor is also configured to process the floating point numbers using SIMD instructions to execute one of an argmax operation and an argmin operation.

Kernel coefficient quantization

Apparatuses, systems, and techniques to optimize memory usage when performing matrix operations. In at least one embodiment, a matrix is optimized to limit memory and storage requirements while minimizing loss of precision for a sum of the members of the matrix.

Acceleration circuitry

Systems, apparatuses, and methods related to acceleration circuitry are described. The acceleration circuitry may be deployed in a memory device and can include a memory resource and/or logic circuitry. The acceleration circuitry can perform operations on data to convert the data between one or more numeric formats, such as floating-point and/or universal number (e.g., posit) formats. The acceleration circuitry can perform arithmetic and/or logical operations on the data after the data has been converted to a particular format. For instance, the memory resource can receive data comprising a bit string having a first format that provides a first level of precision. The logic circuitry can receive the data from the memory resource and convert the bit string to a second format that provides a second level of precision that is different from the first level of precision.

HIGHLY INTEGRATED SCALABLE, FLEXIBLE DSP MEGAMODULE ARCHITECTURE

Disclosed embodiments include an electronic device having a processor core, a memory, a register, and a data load unit to receive a plurality of data elements stored in the memory in response to an instruction. All of the data elements hare the same data size, which is specified by one or more coding bits. The data load unit includes an address generator to generate addresses corresponding to locations in the memory at which the data elements are located, and a formatting unit to format the data elements. The register is configured to store the formatted data elements, and the processor core is configured to receive the formatted data elements from the register.

Arithmetic and logical operations in a multi-user network
11074100 · 2021-07-27 · ·

Systems, apparatuses, and methods related to arithmetic and logical operations in a multi-user network are described. An agent may be provisioned with a pool of shared computing resources that includes circuitry to perform operations on data (e.g., one or more posit bit strings) in a multi-user network. The circuitry can perform operations on data to convert the data between one or more formats, such as floating-point and/or universal number (e.g., posit) formats, and can further perform arithmetic and/or logical operations on the converted data. The agent may receive a parameter corresponding to performance of an arithmetic operation and/or a logical operation using one or more posit bit strings and cause performance of the arithmetic operation and/or the logical operation using the one or more posit bit strings.