Patent classifications
G06F9/3859
REDUCED MEMORY WRITE REQUIREMENTS IN A SYSTEM ON A CHIP USING AUTOMATIC STORE PREDICATION
In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
Handling load-exclusive instructions in apparatus having support for transactional memory
An apparatus is described with support for transactional memory and load/store-exclusive instructions using an exclusive monitor indication to track exclusive access to a given address. In response to a predetermined type of load instruction specifying a load target address, which is executed within a given transaction, any exclusive monitor indication previously set for the load target address is cleared. In response to a load-exclusive instruction, an abort is triggered for a transaction for which the given address is specified as one of its working set of addresses. This helps to maintain mutual exclusion between transactional and non-transactional threads even if there is load speculation in the non-transactional thread.
POST-RETIRE SCHEME FOR TRACKING TENTATIVE ACCESSES DURING TRANSACTIONAL EXECUTION
A method and apparatus for post-retire transaction access tracking is herein described. Load and store buffers are capable of storing senior entries. In the load buffer a first access is scheduled based on a load buffer entry. Tracking information associated with the load is stored in a filter field in the load buffer entry. Upon retirement, the load buffer entry is marked as a senior load entry. A scheduler schedules a post-retire access to update transaction tracking information, if the filter field does not represent that the tracking information has already been updated during a pendency of the transaction. Before evicting a line in a cache, the load buffer is snooped to ensure no load accessed the line to be evicted.
INFERRING FUTURE VALUE FOR SPECULATIVE BRANCH RESOLUTION IN A MICROPROCESSOR
A system, processor, programming product and/or method including: an instruction dispatch unit configured to dispatch instructions of a compare immediate-conditional branch instruction sequence; and a compare register having at least one entry to hold information in a plurality of fields. Operations include: writing information from a first instruction of the compare immediate-conditional branch instruction sequence into one or more of the plurality of fields in an entry in the compare register; writing an immediate field and the ITAG of a compare immediate instruction into the entry in the compare register; writing, in response to dispatching a conditional branch instruction, an inferred compare result value into the entry in the compare register; comparing a computed compare result value to the inferred compare result value stored in the entry in the compare register; and not execute the compare immediate instruction or the conditional branch instruction.
DETERMINING A RESTART POINT IN OUT-OF-ORDER EXECUTION
There is provided a data processing apparatus comprising decode circuitry responsive to receipt of a block of instructions to generate control signals indicative of each of the block of instructions, and to analyse the block of instructions to detect a potential hazard instruction. The data processing apparatus is provided with decode circuitry to encode information indicative of a clean restart point into the control signals associated with the potential hazard instruction. The data processing apparatus is provided with data processing circuitry to perform out-of-order execution of at least some of the block of instructions, and control circuitry responsive to a determination, at execution of the potential hazard instruction, that data values used as operands for the potential hazard instruction have been modified by out-of-order execution of a subsequent instruction, to restart execution from the clean restart point and to flush held data values from the data processing circuitry.
Systems, methods, and apparatuses for heterogeneous computing
- Rajesh M. Sankaran ,
- Gilbert Neiger ,
- Narayan Ranganathan ,
- Stephen R. Van Doren ,
- Joseph Nuzman ,
- Niall D. McDonnell ,
- Michael A. O'Hanlon ,
- Lokpraveen B. Mosur ,
- Tracy Garrett Drysdale ,
- Eriko Nurvitadhi ,
- Asit K. Mishra ,
- Ganesh Venkatesh ,
- Deborah T. Marr ,
- Nicholas P. Carter ,
- Jonathan D. Pearce ,
- Edward T. Grochowski ,
- Richard J. Greco ,
- Robert Valentine ,
- Jesus Corbal ,
- Thomas D. Fletcher ,
- Dennis R. Bradford ,
- Dwight P. Manley ,
- Mark J. Charney ,
- Jeffrey J. Cook ,
- Paul Caprioli ,
- Koichi Yamada ,
- Kent D. Glossop ,
- David B. Sheffield
Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.
Method of completing a programmable atomic transaction by ensuring memory locks are cleared
Disclosed is an instruction for a programmable atomic transaction that is executed as the last instruction and that terminates the executing thread, waits for all outstanding store operations to finish, clears the programmable atomic lock, and sends a completion response back to the issuing process. This guarantees that the programmable atomic lock is cleared when the transaction completes. By coupling thread termination with clearing the lock bit, this guarantees that the thread cannot terminate without clearing the lock.
Responding to branch misprediction for predicated-loop-terminating branch instruction
A predicated-loop-terminating branch instruction controls, based on whether a loop termination condition is satisfied, whether the processing circuitry should process a further iteration of a predicated loop body or process a following instruction. If at least one unnecessary iteration of the predicated loop body is processed following a mispredicted-non-termination branch misprediction when the loop termination condition is mispredicted as unsatisfied for a given iteration when it should have been satisfied, processing of the at least one unnecessary iteration of the predicated loop body is predicated to suppress an effect of the at least one unnecessary iteration. When the mispredicted-non-termination branch misprediction is detected for the given iteration of the predicated-loop-terminating branch instruction, in response to determining that a flush suppressing condition is satisfied, flushing of the at least one unnecessary iteration of the predicated loop body is suppressed as a response to the mispredicted-non-termination branch misprediction.
APPARATUS AND METHOD FOR HANDLING MEMORY LOAD REQUESTS
When load requests are generated to support data processing operations, the load requests are buffered in pending load buffer circuitry prior to being carried out. Coalescing circuitry determines for a first load request whether a set of one or more subsequent load requests buffered in the pending load buffer circuitry satisfies an address proximity condition. The address proximity condition is satisfied when all data items identified by the set of one or more subsequent load requests are comprised within a series of data items which will be retrieved from the memory system in response to the first load request. When the address proximity condition is satisfied, forwarding of the set of one or more subsequent load requests is suppressed.