Patent classifications
G06F9/3834
VALIDATION OF STORE COHERENCE RELATIVE TO PAGE TRANSLATION INVALIDATION
Systems and methods for invalidating page translation entries are described. A processing element may apply a delay to a drain cycle of a store reorder queue (SRQ) of a processing element. The processing element may drain the SRQ under the delayed drain cycle. The processing element may receive a translation lookaside buffer invalidation (TLBI) instruction from an interconnect connecting the plurality of processing elements. The TLBI instruction may be an instruction to invalidate a translation lookaside buffer (TLB) entry corresponding to at least one of a virtual memory page and a physical memory frame. The TLBI instruction may be broadcasted by another processing element. The application of the delay to the drain cycle of the SRQ may decrease a difference between the drain cycle of the SRQ and an invalidation cycle associated with the TLBI.
OPERATION OF A MULTI-SLICE PROCESSOR IMPLEMENTING SIMULTANEOUS TWO-TARGET LOADS AND STORES
Operation of a multi-slice processor that includes a plurality of execution slices and a load/store superslice, where the load/store superslice includes a set predict array, a first load/store slice, and a second load/store slice. Operation of such a multi-slice processor includes: receiving a two-target load instruction directed to the first load/store slice and a store instruction directed to the second load/store slice; determining a first subset of ports of the set predict array as inputs for an effective address for the two-target load instruction; determining a second subset of ports of the set predict array as inputs for an effective address for the store instruction; and generating, in dependence upon logic corresponding to the set predict array that is less than logic implementing an entire load/store slice, output for performing the two-target load instruction in parallel with generating output for performing the store instruction.
PERFORMING A SYNCHRONIZATION OPERATION ON AN ELECTRONIC DEVICE
In an illustrative example, a method of operation of an electronic device includes identifying a plurality of threads. Each thread of the plurality of threads is configured to execute a plurality of instructions including a barrier instruction corresponding to a target of a synchronization operation. The method further includes selecting a master thread to perform one or more operations associated with the synchronization operation. The method also includes providing an indication of a number of threads included in the plurality of threads to the master thread.
HAZARD GENERATING FOR SPECULATIVE CORES IN A MICROPROCESSOR
A system, mechanism, tool, programming product, processor, and/or method for generating a hazard in a processor includes: identifying one or more cache lines to invalidate in a second level memory of a processing core in the processor; invalidating, in response to identifying one or more cache lines to invalidate in the second level cache, the one or more identified cache lines in the second level memory; and invalidating, in response to invalidating the one or more identified cache lines in the second level memory, the corresponding one or more cache lines in a first level memory. In an aspect the hazard generating mechanism is triggered, preferably on demand, and includes in an approach searching for cache lines in the second level memory that are also in the first level memory.
Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
A system for executing instructions using a plurality of register file segments for a processor. The system includes a global front end scheduler for receiving an incoming instruction sequence, wherein the global front end scheduler partitions the incoming instruction sequence into a plurality of code blocks of instructions and generates a plurality of inheritance vectors describing interdependencies between instructions of the code blocks. The system further includes a plurality of virtual cores of the processor coupled to receive code blocks allocated by the global front end scheduler, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines, wherein the code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors. A plurality register file segments are coupled to the partitionable engines for providing data storage.
Apparatuses, methods, and systems for access synchronization in a shared memory
Systems, methods, and apparatuses relating to access synchronization in a shared memory are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction, and an execution unit to execute the decoded instruction to: receive a first input operand of a memory address to be tracked and a second input operand of an allowed sequence of memory accesses to the memory address, and cause a block of a memory access that violates the allowed sequence of memory accesses to the memory address. In one embodiment, a circuit separate from the execution unit compares a memory address for a memory access request to one or more memory addresses in a tracking table, and blocks a memory access for the memory access request when a type of access violates a corresponding allowed sequence of memory accesses to the memory address for the memory access request.
PROCESSOR THAT INCLUDES A SPECIAL STORE INSTRUCTION USED IN REGIONS OF A COMPUTER PROGRAM WHERE MEMORY ALIASING MAY OCCUR
Processor hardware detects when memory aliasing occurs, and assures proper operation of the code even in the presence of memory aliasing. The processor defines a special store instruction that is different from a regular store instruction. The special store instruction is used in regions of the computer program where memory aliasing may occur. Because the hardware can detect and correct for memory aliasing, this allows a compiler to make optimizations such as register promotion even in regions of the code where memory aliasing may occur.
Instruction and Logic for Total Store Elimination
A processor includes a front end including circuitry to decode instructions from an instruction stream, a data cache unit including circuitry to cache data for the processor, and a binary translator. The binary translator includes circuitry to identify a redundant store in the instruction stream, mark the start and end of a region of the instruction stream with the redundant store, remove the redundant store, and store an amended instruction stream with the redundant store removed.
PROCESSOR THAT DETECTS MEMORY ALIASING IN HARDWARE AND ASSURES CORRECT OPERATION WHEN MEMORY ALIASING OCCURS
Processor hardware detects when memory aliasing occurs, and assures proper operation of the code even in the presence of memory aliasing. Because the hardware can detect and correct for memory aliasing, this allows a compiler to make optimizations such as register promotion even in regions of the code where memory aliasing can occur. The result is code that is more optimized and therefore runs faster.
OPERATION OF A MULTI-SLICE PROCESSOR IMPLEMENTING LOAD-HIT-STORE HANDLING
Operation of a multi-slice processor that includes a plurality of execution slices and an instruction sequencing unit. Operation of such a multi-slice processor includes: receiving, at the instruction sequencing unit, a load instruction indicating load address data and a load data length; determining a previous store instruction in an issue queue such that store address data for the previous store instruction corresponds to the load address data, wherein the previous store instruction corresponds to a store data length; and generating, in dependence upon the store data length matching the load data length, an indication in the issue queue that indicates a dependency between the load instruction and the previous store instruction.