Patent classifications
G06F9/3834
Handling load-exclusive instructions in apparatus having support for transactional memory
An apparatus is described with support for transactional memory and load/store-exclusive instructions using an exclusive monitor indication to track exclusive access to a given address. In response to a predetermined type of load instruction specifying a load target address, which is executed within a given transaction, any exclusive monitor indication previously set for the load target address is cleared. In response to a load-exclusive instruction, an abort is triggered for a transaction for which the given address is specified as one of its working set of addresses. This helps to maintain mutual exclusion between transactional and non-transactional threads even if there is load speculation in the non-transactional thread.
POST-RETIRE SCHEME FOR TRACKING TENTATIVE ACCESSES DURING TRANSACTIONAL EXECUTION
A method and apparatus for post-retire transaction access tracking is herein described. Load and store buffers are capable of storing senior entries. In the load buffer a first access is scheduled based on a load buffer entry. Tracking information associated with the load is stored in a filter field in the load buffer entry. Upon retirement, the load buffer entry is marked as a senior load entry. A scheduler schedules a post-retire access to update transaction tracking information, if the filter field does not represent that the tracking information has already been updated during a pendency of the transaction. Before evicting a line in a cache, the load buffer is snooped to ensure no load accessed the line to be evicted.
Methods for updating reference count and shared objects in a concurrent system
A method for to manage concurrent access to a shared resource in a distributed computing environment. A reference counter counts is incremented for every use of an object subtype in a session and decremented for every release of an object subtype in a session. A session counter is incremented upon the first instance of fetching an object type into a session cache and decremented upon having no instances of the object type in use in the session. When both the reference counter and the session counter are zero, the object type may be removed from the cache.
COOPERATIVE GARBAGE COLLECTION BARRIER ELISION
Techniques are disclosed for eliding load and store barriers while maintaining garbage collection invariants. Embodiments described herein include techniques for identifying an instruction, such as a safepoint poll, that checks whether to pause a thread between execution of a dominant and dominated access to the same data field. If a poll instruction is identified between the two data accesses, then a pointer for the data field may be recorded in an entry associated with the poll instruction. When the thread is paused to execute a garbage collection operation, the recorded information may be used to update values associated with the data field in memory such that the dominated access may be executed without any load or store barriers.
PROCESSING WORK ITEMS IN PROCESSING LOGIC
A plurality of work items are processed through a processing pipeline comprising a plurality of stages in processing logic. The processing of a work item includes: (i) reading data in accordance with a memory address associated with the work item, (ii) updating the read data, and (iii) writing the updated data in accordance with the memory address associated with the work item. The method includes processing a first work item and a second work item through the processing pipeline, wherein the processing of the first work item through the pipeline is initiated earlier than the processing of the second work item, and where it is determined that the first and second work items are associated with the same memory address, first updated data of the first work item is written to a register in the processing logic, and the processing of the second work item comprises reading the first updated data from the register instead of reading data from the memory.
Graphics systems and methods for accelerating synchronization using fine grain dependency check and scheduling optimizations based on available shared memory space
Accelerated synchronization operations using fine grain dependency check are disclosed. A graphics multiprocessor includes a plurality of execution units and synchronization circuitry that is configured to determine availability of at least one execution unit. The synchronization circuitry to perform a fine grain dependency check of availability of dependent data or operands in shared local memory or cache when at least one execution unit is available.
PROCESSOR CIRCUIT AND DATA PROCESSING METHOD
A processor circuit includes an instruction decode unit, an instruction detector, an address generator and a data buffer. The instruction decode unit is configured to decode a first load instruction included in a plurality of load instructions to generate a first decoding result. The instruction detector, coupled to the instruction decode unit, is configured to detect if the load instructions use a same register. The address generator, coupled to the instruction decode unit, is configured to generate a first address requested by the first load instruction according to the first decoding result. The data buffer is coupled to the instruction detector and the address generator. When the instruction detector detects that the load instructions use the same register, the data buffer is configured to store the first address generated from the address generator, and store data requested by the first load instruction according to the first address.
HANDLING OF SINGLE-COPY-ATOMIC LOAD/STORE INSTRUCTION
In response to a single-copy-atomic load/store instruction for requesting an atomic transfer of a target block of data between the memory system and the registers, where the target block has a given size greater than a maximum data size supported for a single load/store micro-operation by a load/store data path, instruction decoding circuitry maps the single-copy-atomic load/store instruction to two or more mapped load/store micro-operations each for requesting transfer of a respective portion of the target block of data. In response to the mapped load/store micro-operations, load/store circuitry triggers issuing of a shared memory access request to the memory system to request the atomic transfer of the target block of data of said given size to or from the memory system, and triggers separate transfers of respective portions of the target block of data over the load/store data path.
MEMORY BARRIER ELISION FOR MULTI-THREADED WORKLOADS
A system includes a memory, at least one physical processor in communication with the memory, and a plurality of threads executing on the at least one physical processor. A first thread of the plurality of threads is configured to execute a plurality of instructions that includes a restartable sequence. Responsive to a different second thread in communication with the first thread being pre-empted while the first thread is executing the restartable sequence, the first thread is configured to restart the restartable sequence prior to reaching a memory barrier.
Queues for inter-pipeline data hazard avoidance
Methods and parallel processing units for avoiding inter-pipeline data hazards identified at compile time. For each identified inter-pipeline data hazard the primary instruction and secondary instruction(s) thereof are identified as such and are linked by a counter which is used to track that inter-pipeline data hazard. When a primary instruction is output by the instruction decoder for execution the value of the counter associated therewith is adjusted to indicate that there is hazard related to the primary instruction, and when primary instruction has been resolved by one of multiple parallel processing pipelines the value of the counter associated therewith is adjusted to indicate that the hazard related to the primary instruction has been resolved. When a secondary instruction is output by the decoder for execution, the secondary instruction is stalled in a queue associated with the appropriate instruction pipeline if at least one counter associated with the primary instructions from which it depends indicates that there is a hazard related to the primary instruction.