Patent classifications
G06F9/3012
GATHERING PAYLOAD FROM ARBITRARY REGISTERS FOR SEND MESSAGES IN A GRAPHICS ENVIRONMENT
An apparatus to facilitate gathering payload from arbitrary registers for send messages in a graphics environment is disclosed. The apparatus includes processing resources comprising execution circuitry to receive a send gather message instruction identifying a number of registers to access for a send message and identifying IDs of a plurality of individual registers corresponding to the number of registers; decode a first phase of the send gather message instruction; based on decoding the first phase, cause a second phase of the send gather message instruction to bypass an instruction decode stage; and dispatch the first phase subsequently followed by dispatch of the second phase to a send pipeline. The apparatus can also perform an immediate move of the IDs of the plurality of individual registers to an architectural register of the execution circuitry and include a pointer to the architectural register in the send gather message instruction.
Hierarchical register file device based on spin transfer torque-random access memory
The embodiments provide a register file device which increases energy efficiency using a spin transfer torque-random access memory for a register file used to compute a general purpose graphic processing device, and hierarchically uses a register cache and a buffer together with the spin transfer torque-random access memory, to minimize leakage current, reduce a write operation power, and solve the write delay.
Streaming engine with flexible streaming engine template supporting differing number of nested loops with corresponding loop counts and loop offsets
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.
PROCESSING DEVICE AND METHOD OF SHARING STORAGE BETWEEN CACHE MEMORY, LOCAL DATA STORAGE AND REGISTER FILES
An accelerated processing device is provided which comprises a plurality of compute units each including a plurality of SIMD units, and each SIMD unit comprises a register file. The accelerated processing device also comprises LDS in communication with each of the SIMD units. The accelerated processing device also comprises a first portion of cache memory, in communication with each of the SIMD units and a second cache portion of memory shared by the compute units. The compute units are configured to execute a program in which a storage portion of at least one of the register file of a SIMD unit, the first portion of cache memory and the LDS is reserved as part of another of the register file, the first portion of cache memory and the LDS.
Processor for avoiding reduced performance using instruction metadata to determine not to maintain a mapping of a logical register to a physical register in a first level register file
A processor includes a first level register file, second level register file, and register file mapper. The first and second level register files are comprised of physical registers, with the first level register file more efficiently accessed relative to the second level register file. The register file mapper is coupled with the first and second level register files. The register file mapper comprises a mapping structure and register file mapper controller. The mapping structure hosts mappings between logical registers and physical registers of the first level register file. The register file mapper controller determines whether to map a destination logical register of an instruction to a physical register in the first level register file. The register file mapper controller also determines, based on metadata associated with the instruction, whether to write data associated with the destination logical register to one of the physical registers of the second level register file.
Execution control of a multi-threaded, self-scheduling reconfigurable computing fabric
Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array. A representative configurable circuit includes a configurable computation circuit and a configuration memory having a first, instruction memory storing a plurality of data path configuration instructions to configure a data path of the configurable computation circuit; and a second, instruction and instruction index memory storing a plurality of spoke instructions and data path configuration instruction indices for selection of a master synchronous input, a current data path configuration instruction, and a next data path configuration instruction for a next configurable computation circuit.
Back-up and restoration of register data
A system includes: a processor; a register configured to store a plurality of words, non-volatile memory having a plurality of cells, each cell corresponding to one of the words of the register, and wherein the each cell of the plurality of cells are set to an initial reset value; a first controller that in response to a loss in power: determines the word stored by the register; and changes the initial reset value of the cell of the non-volatile memory corresponding to the determined word stored by the register to a set value; a second controller that in response to detecting a restoration in power: identifies the cell having the set value; writes the word corresponding to the identified cell to the register; and resets the cells of the non-volatile memory to the initial reset value.
Managing control data
There is provided a neural processing unit (NPU), including a primary processing node containing primary control registers and processing circuitry configured to write control data to the primary control registers, and multiple secondary processing nodes each having respective secondary control registers and being configured to process data in accordance with control data stored by the respective secondary control registers. The NPU also includes a bus interface for transmitting data between the primary processing node and the plurality of secondary processing nodes. The primary processing node is configured to transmit first control data to a given secondary control register of each of the plurality of secondary processing nodes.
SPARSE MATRIX CALCULATIONS UTILIZING TIGHTLY COUPLED MEMORY AND GATHER/SCATTER ENGINE
A processor for sparse matrix calculation includes an on-chip memory, a cache, a gather/scatter engine, and a core. The on-chip memory stores a first matrix or vector, and the cache stores a compressed sparse second matrix data structure. The compressed sparse second matrix data structure includes a value array including non-zero element values of the sparse second matrix, where each entry includes a given number of element values; and a column index array where each entry includes the given number of offsets matching the value array. The gather/scatter engine gathers element values of the first matrix or vector using the column index array of the sparse second matrix. In a hybrid horizontal/vertical implementation, the gather/scatter engine gathers sets of element values from sets of rows and from different sub-banks within the same rows based on the column index array of the sparse matrix.
ROUTING INSTRUCTION RESULTS TO A REGISTER BLOCK OF A SUBDIVIDED REGISTER FILE BASED ON REGISTER BLOCK UTILIZATION RATE
A system, processor, programming product and/or method for assigning instructions to destination register file blocks, and/or routing instructions, includes: providing a processing pipeline having two or more execution units configured to process instructions; providing a register file having register file entries configured to hold data, where the register file is subdivided into a plurality of register blocks and each register block has two or more register file entries; calculating a utilization rate for one or more register blocks; and assigning and/or routing an instruction to write its results to a register block based upon the utilization rate for that register block. Preferably the execution unit is configured to write its results to a single specific destination (rename) register block.