G06F9/315

Replicate partition instruction
11947962 · 2024-04-02 · ·

In response to a replicate partition instruction specifying partition information defining positions of a plurality of variable size partitions within a result vector, an instruction decoder (20) controls the processing circuitry (80) to generate a result vector in which each partition having more than one data element comprises data values or element indices of a sequence of data elements of a source vector starting or ending at a selected data element position. This instruction can be useful for accelerating processing of data structures smaller than the vector length.

Mixed-width SIMD operations using even/odd register pairs for wide data elements

Systems and methods relate to a mixed-width single instruction multiple data (SIMD) instruction which has at least a source vector operand comprising data elements of a first bit-width and a destination vector operand comprising data elements of a second bit-width, wherein the second bit-width is either half of or twice the first bit-width. Correspondingly, one of the source or destination vector operands is expressed as a pair of registers, a first register and a second register. The other vector operand is expressed as a single register. Data elements of the first register correspond to even-numbered data elements of the other vector operand expressed as a single register, and data elements of the second register correspond to data elements of the other vector operand expressed as a single register.

Sliding window operation
10459731 · 2019-10-29 · ·

A first register has a lane storing first input data and a second register has a lane storing second input data elements. A width of the lane of the second register is equal to a width of the lane of the first register. A single-instruction-multiple-data (SIMD) lane has a lane width equal to the width of the lane of the first register. The SIMD lane is configured to perform a sliding window operation on the first input data elements in the lane of the first register and the second input data elements in the lane of the second register. Performing the sliding window operation includes determining a result based on a first input data element stored in a first position of the first register and a second input data element stored in a second position of the second register. The second position is different from the first position.

Decimal load immediate instruction

An instruction generates a value for use in processing within a computing environment. The instruction obtains a sign control associated with the instruction, and shifts an input value of the instruction in a specified direction by a selected amount to provide a result. The result is placed in a first designated location in a register, and the sign, which is based on the sign control, is placed in a second designated location of the register. The result and the sign provide a signed value to be used in processing within the computing environment.

Replicate elements instruction
11977884 · 2024-05-07 · ·

A replicate elements instruction defining a plurality of variable length segments in a result vector controls processing circuitry (80) to generate a result vector in which, in each respective segment, a repeating value is repeated throughout that segment of the result vector, the repeating value comprising a data value or element index of a selected data element of a source vector. This instructions is useful for accelerating processing of data structures smaller than the vector length.

Systems, apparatuses, and methods for setting an output mask in a destination writemask register from a source write mask register using an input writemask and immediate

Embodiments of systems, apparatuses, and methods for performing in a computer processor generation of a predicate mask based on vector comparison in response to a single instruction are described.

Streaming engine with flexible streaming engine template supporting differing number of nested loops with corresponding loop counts and loop offsets
10339057 · 2019-07-02 · ·

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.

Decimal load immediate instruction

An instruction generates a value for use in processing within a computing environment. The instruction obtains a sign control associated with the instruction, and shifts an input value of the instruction in a specified direction by a selected amount to provide a result. The result is placed in a first designated location in a register, and the sign, which is based on the sign control, is placed in a second designated location of the register. The result and the sign provide a signed value to be used in processing within the computing environment.

Apparatus and method for performing a splice of vectors based on location and length data
12061906 · 2024-08-13 · ·

An apparatus and a method are provided for performing a splice operation, the apparatus having a set of vector registers and one or more control registers. Processing circuitry is arranged to execute a sequence of instructions including a splice instruction that identifies at least a first vector register and at least one control register. The first vector register stores a first vector of data elements having a vector length, and the at least one control register stores control data identifying one or more data elements occupying sequential data element positions within the first vector of data elements. The processing circuitry is responsive to execution of the splice instruction to extract from the first vector each data element identified by the control data in the at least one control register, and to output the extracted data elements within sequential data element positions of the result vector starting from a first end of the result vector, and data elements from a second vector are output to the remaining result vector data element positions not occupied by the extracted data elements from the first vector.

Swap operations in memory
10157126 · 2018-12-18 · ·

Examples of the present disclosure provide apparatuses and methods related to performing swap operations in a memory. An example apparatus might include a first group of memory cells coupled to a first sense line and configured to store a first element. An example apparatus might also include a second group of memory cells coupled to a second sense line and configured to store a second element. An example apparatus might also include a controller configured to cause the first element to be stored in the second group of memory cells and the second element to be stored in the first group of memory cells by controlling sensing circuitry to perform a number operations without transferring data via an input/output (I/O) line.