G06F9/30038

SM4 acceleration processors, methods, systems, and instructions
10447468 · 2019-10-15 · ·

A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate one or more source packed data operands. The one or more source packed data operands are to have four 32-bit results of four prior SM4 cryptographic rounds, and four 32-bit values. The processor also includes an execution unit coupled with the decode unit and the plurality of the packed data registers. The execution unit, in response to the instruction, is to store four 32-bit results of four immediately subsequent and sequential SM4 cryptographic rounds in a destination storage location that is to be indicated by the instruction.

COMPUTATION PROCESSING APPARATUS AND METHOD OF PROCESSING COMPUTATION
20240143331 · 2024-05-02 · ·

A computation processing apparatus includes: a memory; and a processor coupled to the memory and configured to: decode instructions; execute the instructions which is decoded and operate as a plurality of sub-computation processing apparatuses in accordance with a bit width of data to be computed; and observe an operation state of the computation processing apparatus, wherein, when observing that a subset of the plurality of sub-computation processing apparatuses does not execute an instruction or instructions, the processor parallelizes the instructions and outputs the parallelized instructions.

SM4 acceleration processors, methods, systems, and instructions
10425222 · 2019-09-24 · ·

A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate one or more source packed data operands. The one or more source packed data operands are to have four 32-bit results of four prior SM4 cryptographic rounds, and four 32-bit values. The processor also includes an execution unit coupled with the decode unit and the plurality of the packed data registers. The execution unit, in response to the instruction, is to store four 32-bit results of four immediately subsequent and sequential SM4 cryptographic rounds in a destination storage location that is to be indicated by the instruction.

PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO GENERATE SEQUENCES OF INTEGERS IN WHICH INTEGERS IN CONSECUTIVE POSITIONS DIFFER BY A CONSTANT INTEGER STRIDE AND WHERE A SMALLEST INTEGER IS OFFSET FROM ZERO BY AN INTEGER OFFSET

A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed.

SM4 acceleration processors, methods, systems, and instructions
10419210 · 2019-09-17 · ·

A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate one or more source packed data operands. The one or more source packed data operands are to have four 32-bit results of four prior SM4 cryptographic rounds, and four 32-bit values. The processor also includes an execution unit coupled with the decode unit and the plurality of the packed data registers. The execution unit, in response to the instruction, is to store four 32-bit results of four immediately subsequent and sequential SM4 cryptographic rounds in a destination storage location that is to be indicated by the instruction.

Vector operand bitsize control
10409602 · 2019-09-10 · ·

A data processing system (2) includes processing circuitry (18) and decoder circuitry (14) for decoding program instructions and controlling the processor circuitry. The decoder circuitry is responsive to a vector operand bit size dependant instruction executed within a selected exception level state of a hierarchy of exception level states to control the processing circuitry to perform processing with a vector operand bit size governed by a limiting value of the vector operand bit size associated with the currently selected exception level state, any programmable limit value set for an exception level state closer to a top exception level state within the hierarchy and the implemented limit.

INTERRUPTIBLE AND RESTARTABLE MATRIX MULTIPLICATION INSTRUCTIONS, PROCESSORS, METHODS, AND SYSTEMS

A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.

Superimposing butterfly network controls for pattern combinations

A multilayer butterfly network is shown that is operable to transform and align a plurality of fields from an input to an output data stream. Many transformations are possible with such a network which may include separate control of each multiplexer. This invention supports a limited set of multiplexer control signals, which enables a similarly limited set of data transformations. This limited capability is offset by the reduced complexity of the multiplexor control circuits. This invention used precalculated inputs and simple combinatorial logic to generate control signals for the butterfly network. Controls are independent for each layer and therefore are dependent only on the input and output patterns. Controls for the layers can be calculated in parallel.

Sliding window encoding methods for executing vector compare instructions to write distance and match information to different sections of the same register

A processor is described having an instruction execution pipeline having a functional unit to execute an instruction that compares vector elements against an input value. Each of the vector elements and the input value have a first respective section identifying a location within data and a second respective section having a byte sequence of the data. The functional unit has comparison circuitry to compare respective byte sequences of the input vector elements against the input value's byte sequence to identify a number of matching bytes for each comparison. The functional unit also has difference circuitry to determine respective distances between the input vector's elements' byte sequences and the input value's byte sequence within the data.

Perform sign operation decimal instruction

An instruction to perform a sign operation of a plurality of sign operations configured for the instruction. The instruction is executed, and the executing includes selecting at least a portion of an input operand as a result to be placed in a select location. The selecting is based on a control of the instruction, in which the control indicates a user-defined size of the input operand to be selected as the result. A sign of the result is determined based on a plurality of criteria, including a value of the result, obtained based on the control of the instruction, having a first particular relationship or a second particular relationship with respect to a selected value. The result and the sign are stored in the select location to provide a signed output to be used in processing within the computing environment.