G06F9/34

Inserting null vectors into a stream of vectors

Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array, a null vector count (N), and a selected dimension. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. N null stream vectors are inserted into the stream of vectors for the selected dimension without fetching respective null data from the memory.

ROTATIONAL DISPATCH FOR PARALLEL SLICE PROCESSOR

Supplemental instruction dispatch may be used in some instances in a parallel slice processor to dispatch additional instructions, referred to as supplemental instructions, to supplemental instruction ports of execution slices and using primary instruction ports of one or more execution slices to supply one or more source operands for such supplemental instructions. In addition, in some instances, in lieu of or in addition to supplemental instruction dispatch, selective slice partitioning may be used to selectively partition groups of execution slices in a parallel slice processor based upon a threading mode within which such execution slices are executing.

Data type based cryptographic computing

A processor includes a register to store an encoded pointer to a memory location in memory and the encoded pointer is to include an encrypted portion. The processor further includes circuitry to determine a first data encryption factor based on a first data access instruction, decode the encoded pointer to obtain a memory address of the memory location, use the memory address to access an encrypted first data element, and decrypt the encrypted first data element using a cryptographic algorithm with first inputs to generate a decrypted first data element. The first inputs include the first data encryption factor based on the first data access instruction and a second data encryption factor from the encoded pointer.

Apparatus and method for efficient gather and scatter operations

An apparatus and method are described for performing efficient gather operations in a pipelined processor. For example, a processor according to one embodiment of the invention comprises: gather setup logic to execute one or more gather setup operations in anticipation of one or more gather operations, the gather setup operations to determine one or more addresses of vector data elements to be gathered by the gather operations; and gather logic to execute the one or more gather operations to gather the vector data elements using the one or more addresses determined by the gather setup operations.

System and methods for expandably wide processor instructions

Expandably wide operations are disclosed in which operands wider than the data path between a processor and memory are used in executing instructions. The expandably wide operands reduce the influence of the characteristics of the associated processor in the design of functional units performing calculations, including the width of the register file, the processor clock rate, the exception subsystem of the processor, and the sequence of operations in loading and use of the operand in a wide cache memory.

Systems, apparatuses, and methods for data speculation execution

Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode and an operand to store a portion of a fallback address and an operand to store a stride value, execution hardware to execute the decoded instruction to initiate a data speculative execution (DSX) region by activating DSX tracking hardware to track speculative memory accesses and detect ordering violations in the DSX region, and storing the fallback address.

Systems, apparatuses, and methods for data speculation execution

Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode and an operand to store a portion of a fallback address and an operand to store a stride value, execution hardware to execute the decoded instruction to initiate a data speculative execution (DSX) region by activating DSX tracking hardware to track speculative memory accesses and detect ordering violations in the DSX region, and storing the fallback address.

TRACKING LOAD AND STORE INSTRUCTIONS AND ADDRESSES IN AN OUT-OF-ORDER PROCESSOR

A computer system, processor, and/or load-store unit has a data cache for storing data, the data cache having a plurality of entries to store the data, each data cache entry addressed by a row and a Way, each data cache row having a plurality of the data cache Ways; a first Address Directory organized and arranged the same as the data cache where each first Address Directory entry is addressed by a row and a Way where each row has a plurality of Ways; a store reorder queue for tracking the store instructions; and a load reorder queue for tracking load instruction. Each of the load and store reorder queues having a Way bit field, preferably less than six bits, for identifying the data cache Way and/or a first Address Directory Way where the Way bit field acts as a proxy for a larger address, e.g. a real page number.

Computer processor employing split crossbar circuit for operand routing and slot-based organization of functional units

A computer processor including a plurality of functional units that performs operations that produce result operands at different characteristic latencies over multiple cycles. An interconnect network provides data paths for transfer of operand data between functional units. The interconnect network includes first and second crossbar parts. The first crossbar part is configured to route result operands produced with the lowest characteristic latency to any other functional unit. The second crossbar part is configured to route result operands with higher characteristic latency relative to the lowest characteristic latency to the first crossbar part where such result operands are in turn routed to any functional unit. In another aspect, the functional units can be organized as multiple slots where each slot can produce multiple result operands of different characteristic latencies in the same cycle, and wherein each slot employs separate result registers for each characteristic latency present on the slot.

Allocating multiple operand data areas of a computer instruction within a program buffer

The disclosure herein provides systems, methods, and computer program products for managing a plurality of operands in a computer instruction. To manage the plurality of operands, a data buffer manager executed by a processor receives information from a caller. The information relates to the plurality of operands. The data buffer manager, also, compares a free data area size to a requested minimum data area of an operand size identified by the information; selects an address when the requested minimum data area is less than or equal to the free data area size; and inserts the operand at the address.