IPIQ

G06F9/305

Vector logical operation and test instructions with result negation

11593105 · 2023-02-28 ·

Intel Corporation

Elmoustapha Ould-Ahmed-Vall

Systems, methods, and apparatuses relating to performing logical operations on packed data elements and testing the results of that logical operation to generate a packed data resultant are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction, the instruction having fields that identify a first packed data source, a second packed data source, and a packed data destination, and an opcode that indicates a bitwise logical operation to perform on the first packed data source and the second packed data source and indicates a width of each element of the first packed data source and the second packed data source; and an execution circuit to execute the decoded instruction to perform the bitwise logical operation indicated by the opcode on the first packed data source and the second packed data source to produce a logical operation result of packed data elements having a same width as the width indicated by the opcode, perform a test operation on each element of the logical operation result to set a corresponding bit in a packed data test operation result to a first value when any of the bits in a respective element of the logical operation result are set to the first value, and set the corresponding bit to a second value otherwise, and store the packed data test operation result into the packed data destination.

Method and apparatus for performing reduction operations on a set of vector elements

09851970 · 2017-12-26 ·

Intel Corporation

An apparatus and method are described for performing SIMD reduction operations. For example, one embodiment of a processor comprises: a value vector register containing a plurality of data element values to be reduced; an index vector register to store a plurality of index values indicating which values in the value vector register are associated with one another; single instruction multiple data (SIMD) reduction logic to perform reduction operations on the data element values within the value vector register by combining data element values from the value vector register which are associated with one another as indicated by the index values in the index vector register; and an accumulation vector register to store results of the reduction operations generated by the SIMD reduction logic.

Method and apparatus for vector index load and store

09830151 · 2017-11-28 ·

Intel Corporation

An apparatus and method for performing vector index loads and stores. For example, one embodiment of a processor comprises: a vector index register to store a plurality of index values; a mask register to store a plurality of mask bits; a vector register to store a plurality of vector data elements loaded from memory; and vector index load logic to identify an index stored in the vector index register to be used for a load operation using an immediate value and to responsively combine the index with a base memory address to determine a memory address for the load operation, the vector index load logic to load vector data elements from the memory address to the vector register in accordance with the plurality of mask bits.

Method and apparatus for efficiently managing architectural register state of a processor

09804842 · 2017-10-31 ·

Intel Corporation

An apparatus and method for efficiently managing the architectural state of a processor. For example, one embodiment of a processor comprises: a source mask register to be logically subdivided into at least a first portion to store a usable portion of a mask value and a second portion to store an indication of whether the usable portion of the mask value has been updated; a control register to store an unusable portion of the mask value; architectural state management logic to read the indication to determine whether the mask value has been updated prior to performing a store operation, wherein if the mask value has been updated, then the architectural state management logic is to read the usable portion of the mask value from the first portion of the source mask register and zero out bits of the unusable portion of the mask value to generate a final mask value to be saved to memory, and wherein if the mask value has not been updated, then the architectural state management logic is to concatenate the usable portion of the mask value with the unusable portion of the mask value read from the control register to generate a final mask value to be saved to memory.

Floating point instruction with selectable comparison attributes

09785435 · 2017-10-10 ·

International Business Machines Corporation

An instruction to perform a comparison of a first value and a second value is executed. Based on a control of the instruction, a compare function to be performed is determined. The compare function is one of a plurality of compare functions configured for the instruction, and the compare function has a plurality of options for comparison. A compare option based on the first value and the second value is selected from the plurality of options defined for the compare function, and used to compare the first value and the second value. A result of the comparison is then placed in a select location, the result to be used in processing within a computing environment.

Merging and sorting arrays on an SIMD processor

09740659 · 2017-08-22 ·

International Business Machines Corporation

Methods, systems, and articles of manufacture for merging and sorting arrays on a processor are provided herein. A method includes splitting an input array into multiple sub-arrays across multiple processing elements; merging the multiple sub-arrays into multiple vectors; and sorting the multiple vectors by comparing and swapping one or more vector elements among the multiple vectors.

Method for a delayed branch implementation by using a front end track table

10908913 · 2021-02-02 ·

Intel Corporation

Mohammad Abdallah

A method for a delayed branch implementation by using a front end track table. The method includes receiving an incoming instruction sequence using a global front end, wherein the instruction sequence includes at least one branch, creating a delayed branch in response to receiving the one branch, and using a front end track table to track both the delayed branch the one branch.

Efficient mapping of input data to vectors for a predictive model

10908900 · 2021-02-02 ·

Groq, Inc.

Jonathan Alexander Ross

A system may comprise a processor integrated circuit (IC) and a vector mapping sub-system that is separate from the processor IC and includes one or more ICs. The system may receive input data for processing by a predictive model and generate at least one memory address from the input data. At least one memory address may be provided to the vector mapping sub-system. The vector mapping sub-system generates a resulting vector of numbers based on the at least one memory address. The resulting vector can be a fixed length vector representation of the input data. The resulting vector is provided from the vector mapping sub-system to the processor IC. The processor IC executes one or more instructions for the predictive model using the resulting vector to generate a prediction. A corresponding method also is disclosed.

Hardware accelerators and methods for high-performance authenticated encryption

10705842 · 2020-07-07 ·

Intel Corporation

Methods and apparatuses relating to high-performance authenticated encryption are described. A hardware accelerator may include a vector register to store an input vector of a round of an encryption operation; a circuit including a first data path including a first modular adder coupled to a first input from the vector register and a second input from the vector register, and a second modular adder coupled to the first modular adder and a second data path from the vector register, and the second data path including a first logical XOR circuit coupled to the second input and a third data path from the vector register, a first rotate circuit coupled to the first logical XOR circuit, a second logical XOR circuit coupled to the first rotate circuit and the third data path, and a second rotate circuit coupled to the second logical XOR circuit; and a control circuit to cause the first modular adder and the second modular adder of the first data path and the first logical XOR circuit, the second logical XOR circuit, the first rotate circuit, and the second rotate circuit of the second data path to perform a portion of the round according to one or more control values, and store a first result from the first data path for the portion and a second result from the second data path for the portion into the vector register.

Conflict mask generation

10691454 · 2020-06-23 ·

Intel Corporation

Single Instruction, Multiple Data (SIMD) technologies are described. A processor can store a first bitmap and generate a second bitmap with each cell identifying a mask bit. The mask bit is set when 1) a corresponding cell in a first bitmap is not in conflict with other elements in the first bitmap or 2) a corresponding cell is in conflict with one or more other cells in the first bitmap and is a last cell in a sequential order of the first bitmap that conflicts with the one or more other cells, wherein a position of each cell in the second bitmap maps to a same position of the corresponding cell in the first bitmap. The processor can store the second bitmap as a mask for a scatter operation to avoid lane conflicts.

Patent classifications

G06F9/305