G06F9/30105

Physical reference list for tracking physical register sharing

A processor includes a processing unit including a storage module having stored thereon a physical reference list for storing identifications of physical registers that have been referenced by multiple logical registers, and a reclamation module for reclaiming physical registers to a free list based on a count of each of the physical registers on the physical reference list.

Logic circuitry

A logic circuitry package for a replaceable print apparatus component comprises an interface to communicate with a print apparatus logic circuit, and at least one logic circuit. The logic circuit may be configured to identify, from a command stream received from the print apparatus, parameters including a class parameter, and/or identify, from the command stream, a read request, and output, via the interface, a count value in response to a read request, the count value based on identified received parameters.

Fine-grained instruction enablement at sub-function granularity based on an indicated subrange of registers

Fine-grained enablement at sub-function granularity. An instruction encapsulates different sub-functions of a function, in which the sub-functions use different sets of registers of a composite register file, and therefore, different sets of functional units. At least one operand of the instruction specifies which set of registers, and therefore, which set of functional units, is to be used in performing the sub-function. The instruction can perform various functions (e.g., move, load, etc.) and a sub-function of the function specifies the type of function (e.g., move-floating point; move-vector; etc.).

DYNAMICALLY RECONFIGURABLE REGISTER FILE

Techniques for managing register allocation are provided. The techniques include detecting a first request to allocate first registers for a first wavefront; first determining, based on allocation information, that allocating the first registers to the first wavefront would result in a condition in which a deadlock is possible; in response to the first determining, refraining from allocating the first registers to the first wavefront; detecting a second request to allocate second registers for a second wavefront; second determining, based on the allocation information, that allocating the second registers to the second wavefront would result in a condition in which deadlock is not possible; and in response to the second determining, allocating the second registers to the second wavefront.

Apparatus and method for storing bounded pointers
11249912 · 2022-02-15 · ·

An apparatus and method are provided for storing bounded pointers. One example apparatus comprises a storage comprising storage elements to store bounded pointers, each bounded pointer comprising a pointer value and associated attributes including at least range information, and processing circuitry to store a bounded pointer in a chosen storage element. The storing process comprises storing in the chosen storage element a pointer value of the bounded pointer, and storing in the storage element the range information of the bounded pointer, such that the range information indicates both a read range of the bounded pointer and a write range of the bounded pointer that differs to the read range. The read range comprises at least one memory address for which reading is allowed when using the bounded pointer, and the write range comprises at least one memory address to which writing is allowed when using the bounded pointer.

TUPLE ENCODING AWARE DIRECT MEMORY ACCESS ENGINE FOR SCRATCHPAD ENABLED MULTICORE PROCESSORS

Techniques are described herein for efficient movement of data from a source memory to a destination memory. In an embodiment, in response to a particular memory location being pushed into a first register within a first register space, the first set of electronic circuits accesses a descriptor stored at the particular memory location. The descriptor indicates a width of a column of tabular data, a number of rows of tabular data, and one or more tabular data manipulation operations to perform on the column of tabular data. The descriptor also indicates a source memory location for accessing the tabular data and a destination memory location for storing data manipulation result from performing the one or more data manipulation operations on the tabular data. Based on the descriptor, the first set of electronic circuits determines control information indicating that the one or more data manipulation operations are to be performed on the tabular data and transmits the control information, using a hardware data channel, to a second set of electronic circuits to perform the one or more operations. Based on the control information, the second set of electronic circuits retrieve the tabular data from source memory location and apply the one or more data manipulation operations to generate the data manipulation result. The second set of electronic circuits cause the data manipulation result to be stored at the destination memory location.

PREVENTING PREMATURE READS FROM A GENERAL PURPOSE REGISTER

Methods and apparatus for preventing premature reads from a general purpose register (GPR) including receiving an instruction comprising a source operand identifying a source GPR entry; setting a read-enabled flag based on a value in a particular entry of a source ready vector; if the read-enabled flag indicates data in the source GPR entry is ready for reading, dispatching the received instruction, including performing a read operation of the data in the source GPR entry; and if the read-enabled flag indicates data in the source GPR entry is not ready for reading, dispatching the received instruction without performing a read operation of the data in the source GPR entry.

Method and Computing System for Handling Instruction Execution Using Affine Register File on Graphic Processing Unit
20170269931 · 2017-09-21 ·

The present invention provides an affine engine design to the microarchitecture of the graphic processing unit, in which an operand type detection is performed, and then physical scalar, affine, or vector registers and corresponding ALUs with maximum performance improving and energy saving are allocated to perform instruction execution. In runtime, affine and uniform instructions are executed by the affine engine, while general vector instructions are executed by a vector engine, thereby the affine/uniform instruction execution can be dispatched to the affine engine, so the vector engine can enter a power-saving state to save the energy consumption of the GPU.

Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
09766893 · 2017-09-19 · ·

A method for executing instructions using a plurality of virtual cores for a processor. The method includes receiving an incoming instruction sequence using a global front end scheduler, and partitioning the incoming instruction sequence into a plurality of code blocks of instructions. The method further includes generating a plurality of inheritance vectors describing interdependencies between instructions of the code blocks, and allocating the code blocks to a plurality of virtual cores of the processor, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines. The code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors.

MASK OPERATION METHOD FOR EXPLICIT INDEPENDENT MASK REGISTER IN GPU
20220236988 · 2022-07-28 ·

Provided is a mask operation method for an explicit independent mask register in a GPU. The method comprises: each GPU hardware thread being able to access respective eight 128-bit-wide independent mask registers, which are recorded as $m0-$m7. With regard to mask operation instructions of the explicit independent mask register in the GPU, each hardware thread in the GPU is able to access respective eight 128-bit-wide independent mask registers, and four groups of mask operation instructions are available for a user, and respectively realize a reduction operation, an extension operation and a logic operation on the mask register, and data movement between the mask register and a general vector register.