Patent classifications
G06F9/3875
Replicating logic blocks to enable increased throughput with sequential enabling of input register blocks
A datapath pipeline which uses replicated logic blocks to increase the throughput of the pipeline is described. In an embodiment, the pipeline, or a part thereof, comprises a number of parallel logic paths each comprising the same logic. Input register stages at the start of each logic path are enabled in turn on successive clock cycles such that data is read into each logic path in turn and the logic in the different paths operates out of phase. The output of the logic paths is read into one or more output register stages and the logic paths are combined using a multiplexer which selects an output from one of the logic paths on any clock cycle. Various optimization techniques are described and in various examples, register retiming may also be used. In various examples, the datapath pipeline is within a processor.
LOCATION AGNOSTIC DATA ACCESS
Apparatuses, systems, and techniques to enable a program to access data regardless of where said data is stored. In at least one embodiment, a system enables a program to access data regardless of where said data is stored, based on, for example, one or more locations encoding one or more addresses of said data.
METADATA PREDICTOR
Embodiments for a metadata predictor. An index pipeline generates indices in an index buffer in which the indices are used for reading out a memory device. A prediction cache is populated with metadata of instructions read from the memory device. A prediction pipeline generates a prediction using the metadata of the instructions from the prediction cache, the populating of the prediction cache with the metadata of the instructions being performed asynchronously to the operating of the prediction pipeline.
Graphics Processing
Disclosed is a method of handling thread termination events within a graphics processor when a group of plural execution lanes are executing in a co-operative state. When a group of lanes is in the co-operative state, in response to the graphics processor encountering an event that means that a subset of one or more execution threads associated with the group of execution lanes in the co-operative state should be terminated: it is determined whether a condition to immediately terminate the subset of one or more execution threads is met. When the condition is not met, the group of execution lanes continue their execution in the co-operative state, but a record is stored to track that the threads in the subset of one or more execution threads should subsequently be terminated.
Exception register delay
A processor includes: memory; an execution pipeline having a plurality of pipeline stages configured to process data provided to the execution pipeline and to store a result of the processing into the memory; a receive pipeline having a plurality of pipeline stages configured to handle incoming data to the processor and storing the incoming data into memory; context status storage configured to hold an exception indicator of an exception encountered by the execution pipeline while the execution pipeline processes data; wherein the receive pipeline is configured to determine that an exception has been committed to the context status storage by the execution pipeline, to suppress a write to memory of any incoming data to be handled by the receive pipeline and to commit a corresponding exception indicator to the context status storage at a final one of its pipeline stages.
Utilizing pipeline registers as intermediate storage
In one example, a method includes responsive to receiving, by a processing unit, one or more instructions requesting that a first value be moved from a first general purpose register (GPR) to a third GPR and that a second value be moved from a second GPR to a fourth GPR, copying, by an initial logic unit and during a first clock cycle, the first value to an initial pipeline register, copying, by the initial logic and during a second clock cycle, the second value to the initial pipeline register, copying, by a final logic unit and during a third clock cycle, the first value from a final pipeline register to the third GPR, and copying, by the final logic unit and during a fourth clock cycle, the second value from the final pipeline register to the fourth GPR.
DATA PROCESSING APPARATUS, DATA PROCESSING METHOD AND PROGRAM
A data process device includes a data input unit and a processor. The processor includes a division unit, a first storage unit and a second storage unit which have a plurality of storage areas, a write unit, a calculation unit, and a control unit. The division unit divides a data series input by the data input unit to generate a plurality of divided data. The write unit writes the divided data to the first storage unit according to writing order to the storage areas in the first storage unit. The calculation unit performs calculation processing on the divided data written to the first storage unit, and writes calculated data obtained by the calculation processing to the second storage unit according to writing order to the storage areas in the second storage unit. The control unit controls processing of the write unit and processing of the calculation unit, which are divided into different processing lines, to be executed in parallel by pipeline processing.
Scalable sparse matrix multiply acceleration using systolic arrays with feedback inputs
Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.
Exception Register Delay
A processor includes: memory; an execution pipeline having a plurality of pipeline stages configured to process data provided to the execution pipeline and to store a result of the processing into the memory; a receive pipeline having a plurality of pipeline stages configured to handle incoming data to the processor and storing the incoming data into memory; context status storage configured to hold an exception indicator of an exception encountered by the execution pipeline while the execution pipeline processes data; wherein the receive pipeline is configured to determine that an exception has been committed to the context status storage by the execution pipeline, to suppress a write to memory of any incoming data to be handled by the receive pipeline and to commit a corresponding exception indicator to the context status storage at a final one of its pipeline stages.
SCALABLE SPARSE MATRIX MULTIPLY ACCELERATION USING SYSTOLIC ARRAYS WITH FEEDBACK INPUTS
Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.