Patent classifications
G06F9/38873
Instruction for accelerating SNOW 3G wireless security algorithm
Vector instructions for performing SNOW 3G wireless security operations are received and executed by the execution circuitry of a processor. The execution circuitry receives a first operand of the first instruction specifying a first vector register that stores a current state of a finite state machine (FSM). The execution circuitry also receives a second operand of the first instruction specifying a second vector register that stores data elements of a liner feedback shift register (LFSR) that are needed for updating the FSM. The execution circuitry executes the first instruction to produce a updated state of the FSM and an output of the FSM in a destination operand of the first instruction.
Vector processing engines (VPEs) employing format conversion circuitry in data flow paths between vector data memory and execution units to provide in-flight format-converting of input vector data to execution units for vector processing operations, and related vector processor systems and methods
Vector processing engines (VPEs) employing format conversion circuitry in data flow paths between vector data memory and execution units to provide in-flight format-converting of input vector data to execution units for vector processing operations are disclosed. Related vector processor systems and methods are also disclosed. Format conversion circuitry is provided in data flow paths between vector data memory and execution units in the VPE. The format conversion circuitry is configured to convert input vector data sample sets fetched from vector data memory in-flight while the input vector data sample sets are being provided over the data flow paths to the execution units to be processed. In this manner, format conversion of the input vector data sample sets does not require pre-processing, storage, and re-fetching from vector data memory, thereby reducing power consumption and not limiting efficiency of the data flow paths by format conversion pre-processing delays.
Apparatus and method for transferring a plurality of data structures between memory and a plurality of vector registers
An apparatus and method are provided for transferring a plurality of data structures between memory and a plurality of vector registers, each vector register being arranged to store a vector operand comprising a plurality of data elements. Access circuitry is used to perform access operations to move data elements of vector operands between the data structures in memory and specified vector registers, each data structure comprising multiple data elements stored at contiguous addresses in the memory. Decode circuitry is responsive to a single access instruction identifying a plurality of vector registers and a plurality of data structures that are located discontiguously with respect to each other in the memory, to generate control signals to control the access circuitry to perform a sequence of access operations to move the plurality of data structures between the memory and the plurality of vector registers such that the vector operand in each vector register holds a corresponding data element from each of the plurality of data structures. This provides a very efficient mechanism for performing complex access operations, resulting in an increase in execution speed, and potential reductions in power consumption.
Processing instructions selected from a first instruction set in a first processing mode and instructions selected from a second different instruction set in a second processing mode
Instruction decoder to decode processing instructions; one or more first registers; first processing circuitry to execute the decoded processing instructions in a first processing mode and configured to execute the decoded processing instructions using the one or more first registers; and control circuitry to execute the decoded processing instructions in a second processing mode using one or more second registers; the instruction decoder being configured to decode processing instructions selected from a first instruction set and a second instruction set in the second processing mode, in which one or both of the first and second instruction sets comprises at least one unique instruction set; the instruction decoder configured to decode one or more mode change instructions to change between the first and second processing mode; and the first processing circuitry configured to change the current processing mode between the first and second processing mode responding to executing mode change instruction.
INSTRUCTION AND LOGIC TO PROVIDE VECTOR SCATTER-OP AND GATHER-OP FUNCTIONALITY
Instructions and logic provide vector scatter-op and/or gather-op functionality. In some embodiments, responsive to an instruction specifying: a gather and a second operation, a destination register, an operand register, and a memory address; execution units read values in a mask register, wherein fields in the mask register correspond to offset indices in the indices register for data elements in memory. A first mask value indicates the element has not been gathered from memory and a second value indicates that the element does not need to be, or has already been gathered. For each having the first value, the data element is gathered from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. When all mask register fields have the second value, the second operation is performed using corresponding data in the destination and operand registers to generate results.
Method and Apparatus to Process 4-Operand SIMD Integer Multiply-Accumulate Intruction
According to one embodiment, a processor includes an instruction decoder to receive an instruction to process a multiply-accumulate operation, the instruction having a first operand, a second operand, a third operand, and a fourth operand. The first operand is to specify a first storage location to store an accumulated value; the second operand is to specify a second storage location to store a first value and a second value; and the third operand is to specify a third storage location to store a third value. The processor further includes an execution unit coupled to the instruction decoder to perform the multiply-accumulate operation to multiply the first value with the second value to generate a multiply result and to accumulate the multiply result and at least a portion of a third value to an accumulated value based on the fourth operand.
Cryptographic support instructions
A data processing system includes a single instruction multiple data register file and single instruction multiple processing circuitry. The single instruction multiple data processing circuitry supports execution of cryptographic processing instructions for performing parts of a hash algorithm. The operands are stored within the single instruction multiple data register file. The cryptographic support instructions do not follow normal lane-based processing and generate output operands in which the different portions of the output operand depend upon multiple different elements within the input operand.
APPROACH FOR A CONFIGURABLE PHASE-BASED PRIORITY SCHEDULER
A streaming multiprocessor (SM) in a parallel processing subsystem schedules priority among a plurality of threads. The SM retrieves a priority descriptor associated with a thread group, and determines whether the thread group and a second thread group are both operating in the same phase. If so, then the method determines whether the priority descriptor of the thread group indicates a higher priority than the priority descriptor of the second thread group. If so, the SM skews the thread group relative to the second thread group such that the thread groups operate in different phases, otherwise the SM increases the priority of the thread group. f the thread groups are not operating in the same phase, then the SM increases the priority of the thread group. One advantage of the disclosed techniques is that thread groups execute with increased efficiency, resulting in improved processor performance.
Apparatus and method for comparing a first vector of data elements and a second vector of data elements
A data processing apparatus includes a comparison unit configured to perform an element comparison process performing a comparison of a first data element at a first index in the first vector with a second data element at a second index in the second vector. A hazard vector generation unit is configured to populate a hazard vector at an index determined by the first index with a value determined by the second index. The comparison unit performs the element comparison process by iteratively comparing data elements of the first vector with each element of a subset of the second vector. It then determines the subset of the second vector as those data elements at indices in the second vector which are less than a current index of the first vector and which are greater than previously determined values of the second index for which the match condition was true.
Instructions and Logic for Set-Multiple-Vector-Elements Operations
A processor includes an execution unit to execute instructions to set data elements of different types, from different source vector registers, in destination vectors of multiple-element data structures, each including elements of multiple types. The execution unit includes logic to extract data elements from specific positions within each source vector register dependent on an instruction encoding or parameter. A vector SET3 instruction encoding specifies that respective data elements be extracted from the same positions within first, second, and third source vector registers to assemble multiple XYZ-type data structures. A vector SET4 instruction encoding specifies that respective data elements be extracted from the same positions within two source vector registers to assemble half the elements of multiple XYZW-type data structures. The execution unit includes logic to place the reorganized data elements in contiguous locations (SET3 operations), or successive even or odd locations (SET4 operations) in the destination vector.